biocore / qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
GNU General Public License v2.0
285 stars 268 forks source link

filter_$(otus/samples)_from_otu_table.py do not work, returned error: object dtype dtype('O') has no native HDF5 equivalent #2205

Closed apascualgarcia closed 6 years ago

apascualgarcia commented 6 years ago

I'm trying to use filter_otus_from_otu_table.py and filter_samples_from_otu_table.py with no success. The three files needed to reproduce the issues are here: test2git.zip.

If I start trying to filter with a file containing just one observation (contained in prueba.txt) it works:

$ filter_otus_from_otu_table.py -i otu.2test.metagenomes.biom -o otu.metagenomes.prueba.biom -e prueba.txt --negate_ids_to_exclude

But if want to get two observations (file prueba2.txt):

$ filter_otus_from_otu_table.py -i otu.2test.metagenomes.biom -o otu.metagenomes.prueba.biom -e prueba2.txt --negate_ids_to_exclude

It doesn't work, and it returns: TypeError: Object dtype dtype('O') has no native HDF5 equivalent

The same happens if I use again the list with one observation (first example) but I do not include the option --negate_ids_to_exclude, so it has problems when multiple observations/samples should be filtered but not with one. The error is also reproduced if I use directly biom:

$ biom subset-table -i otu.2test.metagenomes.biom -a observation -s prueba2.txt -o otu.2test.metagenomes.prueba.biom

Following this issue in biom-format (#513), it suggests that it may be a problem with the metadata. If try to convert to json:

$ biom convert -i otu.2test.metagenomes.biom -o otu.2test.metagenomes.json.biom --table-type="OTU table" --to-json

I get this error TypeError: array([u'["cathepsin L [EC:3.4.22.15]"]'], dtype=object) is not JSON serializable. And if I try to convert it to hdf5 with the suggested option --collapsed-samples:

$ biom convert -i otu.2test.metagenomes.biom -o otu.2test.metagenomes.hdf5.biom --table-type="OTU table" --to-hdf5 --collapsed-samples

I get TypeError: Object dtype dtype('O') has no native HDF5 equivalent. Please note that I controlled that the solutions to this bug (#759) were incorporated in my code. If it helps, I found a similar issue in the project CellProfiler (#995)

jairideout commented 6 years ago

Can you please post your question on the QIIME 1 forum? That's where we provide user support for QIIME 1.

apascualgarcia commented 6 years ago

Sure, although it may be an issue with biom-format (not specific of qiime1 only?)

jairideout commented 6 years ago

The QIIME 1 Forum is likely your best bet because you have a mixture of QIIME 1 and biom-format commands, but you could instead try the biom-format issue tracker. Please don't post in both locations, many of the same developers monitor both. Either way, we don't provide user support for QIIME 1 or biom-format on this issue tracker. Thanks!

apascualgarcia commented 6 years ago

Sorry that I still answer here but I think it would be useful to post the following as it clarifies the problem, just in case someone find it here.

I've been able to perform the filtering making some collage of the code is used in picrust to deal with these matrices. It confirms that the problem comes from the metadata:

import picrust
import h5py
import json
import numpy as np
from biom import load_table
from biom.table import Table
from picrust.util import write_biom_table,picrust_formatter
from biom.util import HAVE_H5PY

table = load_table('otu.2test.metagenomes.biom')
# code found categorize_by_function.py
# metadata are not deserializing correctly. Duct tape it.
update_d = {}
for i, md in zip(table.ids(axis='observation'),
                 table.metadata(axis='observation')):
    update_d[i] = {k: json.loads(v[0]) for k, v in md.items()}
    table.add_metadata(update_d, axis='observation')

target = open("prueba2.txt","r")
genes = [row.strip() for row in target]
table_red=table.filter(genes,axis='observation',inplace=False)

#output in BIOM format found in predict_metagenomes.py
format_fs = {'KEGG_Description': picrust_formatter,
                     'COG_Description': picrust_formatter,
                     'KEGG_Pathways': picrust_formatter,
                     'COG_Category': picrust_formatter
                     }
write_biom_table(table_red,'table.test.biom',format_fs=format_fs) # hdf5
#write_biom_table(table_red,'table.test.biom',write_hdf5=False,format_fs=format_fs) # Json