Closed gregcaporaso closed 8 years ago
Just curious, what's the need for drop
and append
in that example?
Just so it's all in one place (@wdwvt1 was also asking about this). The way we're doing it in QIIME 2 is:
>>> import io
>>> map_f = io.StringIO("""#SampleID\tSomething
0123\ta
0001\tb""")
>>> df = pd.read_csv(map_f, sep='\t', dtype=object)
>>> df.set_index(df.columns[0], drop=True, append=False)
>>> df
Something
#SampleID
0123 a
0001 b
If you don't do the two-step setting of the index, the sample ids will be interpreted as strings (dtype=object
won't be applied to the index column), so in this example the leading zeros would be removed.
The drop=True
and append=False
are not strictly necessary here - they're defaults, but pandas has changed their defaults in backward incompatible ways, so @jairideout added those in to protect us against that in the future (thanks for the explanation of these two parameters @jairideout!).
It would be good to have a
parse_sample_metadata
function, and then use that instead of calls likesample_metadata = pd.read_table(mapping_fp, index_col=0)
. This call will be problematic in some cases that we've run into (heres how it should be done), and we want to be able to fix that in one place rather than have the file be potentially parsed differently in different parts of the code base.