biocore / American-Gut

American Gut open-access data and IPython notebooks
Other
108 stars 81 forks source link

Add zeroes to barcodes in mapping files #131

Closed adamrp closed 9 years ago

adamrp commented 9 years ago

Not sure when/why they were removed, but this adds them back in. I used the code at the following gist:

https://gist.github.com/adamrp/cc337b63e7ac33160f4b

Fixes #130

jwdebelius commented 9 years ago

:+1:

ElDeveloper commented 9 years ago

In running an analysis that was using AG.txt, PGP.txt and others, I ran into a similar issue, tried applying this script but it didn't fix my problem.

Actually looking at the mapping file + the OTU table, there seems to be a mismatch:


In [1]: from biom import load_table

In [2]: from md import load_template_to_dataframe

In [9]: mf = load_template_to_dataframe('AG.txt')
/Users/yoshikivazquezbaeza/.virtualenvs/qiime-dev/lib/python2.7/site-packages/pandas/io/parsers.py:1159: DtypeWarning: Columns (34,200) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

In [10]: bt = load_table('AG.biom')

In [11]: mfids = set(mf.index.tolist())

In [18]: btids = set(bt.ids('sample'))

In [19]: len(mfids)
Out[19]: 4545

In [21]: len(btids)
Out[21]: 4545

In [23]: len(mfids - btids)
Out[23]: 3459
wasade commented 9 years ago

I gotta go back to the processing directory and checkout whats going on

On Fri, Mar 6, 2015 at 5:26 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

In running an analysis that was using AG.txt, PGP.txt and others, I ran into a similar issue, tried applying this script but it didn't fix my problem.

Actually looking at the mapping file + the OTU table, there seems to be a mismatch:

In [1]: from biom import load_table

In [2]: from md import load_template_to_dataframe

In [9]: mf = load_template_to_dataframe('AG.txt')/Users/yoshikivazquezbaeza/.virtualenvs/qiime-dev/lib/python2.7/site-packages/pandas/io/parsers.py:1159: DtypeWarning: Columns (34,200) have mixed types. Specify dtype option on import or set low_memory=False. data = self._reader.read(nrows)

In [10]: bt = load_table('AG.biom')

In [11]: mfids = set(mf.index.tolist())

In [18]: btids = set(bt.ids('sample'))

In [19]: len(mfids) Out[19]: 4545

In [21]: len(btids) Out[21]: 4545

In [23]: len(mfids - btids) Out[23]: 3459

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/131#issuecomment-77661060.

adamrp commented 9 years ago

Were these zeroes added back in by some other fashion, or are we still investigating? If this PR is not needed any more, please close.

wasade commented 9 years ago

Regenerated, this should be safe to close

On Wed, Mar 25, 2015 at 11:02 AM, adamrp notifications@github.com wrote:

Were these zeroes added back in by some other fashion, or are we still investigating? If this PR is not needed any more, please close.

— Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/pull/131#issuecomment-86122206.