Open guillaume-gricourt opened 3 years ago
Hi @guillaume-gricourt, that parser was designed to support classic OTU tables from QIIME1 where the lineages were assured to be balanced with placeholders for unidentified names. TSVs are not BIOM-Format, and are unstructured which, which creates a wide range of edge cases.
As a work around, you could parse counts without metadata, parse the taxonomy separately and add it in with biom.Table.add_metadata
?
Yeah it's a good workaround.
I create biom
files from tsv
to load data into Phyloseq package. Also, this file is my entrypoint to perform others analysis.
From now on, when I'll create this biom
file I'll check the order of metadata on my tsv
file.
As you can create this kind of biom file, it seems to me, it's a feature of interest to implement ?
I'd greatly welcome a pull request to resolve this feature request, otherwise I'm not sure when I'll be able to get to it. A possible work around is below.
$ biom convert -i bad.txt -o bad.biom --to-hdf5
$ python
Python 3.6.11 | packaged by conda-forge | (default, Aug 5 2020, 20:19:23)
[GCC Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import biom
>>> df = pd.read_csv('bad.txt', sep='\t')
>>> df.set_index('#OTU ID', inplace=True)
>>> t = biom.load_table('bad.biom')
>>> formatted = {k: {'taxonomy': v.split(';')} for k, v in df['taxonomy'].items()}
>>> t.add_metadata(formatted, axis='observation')
>>> with biom.util.biom_open('okay.biom', 'w') as fp:
... t.to_hdf5(fp, 'converted')
...
Hi, When you have this: good.txt it'ok When the order of metadata is different : bad.txt You have :
ValueError: 2 columns passed, passed data had 6 columns
Maybe, taking account the maximum of value before parsing them ?biom-format v2.1.10