biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
90 stars 95 forks source link

problem with to_hdf5() #689

Closed amnona closed 8 years ago

amnona commented 8 years ago

When trying to save a biom table with observation metadata, i get the following error:

/Users/amnon/Python/git/heatsequer/heatsequer/experiment/io.py in savetobiom(expdat, filename, format)
    581         if format=='hdf5':
    582                 with biom.util.biom_open(filename, 'w') as f:
--> 583                         tab.to_hdf5(f, "heatsequer")
    584         elif format=='json':
    585                 with open(filename,'w') as f:

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress, format_fs)
   3533                   self.ids(axis='observation'),
   3534                   self.metadata(axis='observation'),
-> 3535                   self.group_metadata(axis='observation'), 'csr', compression)
   3536         axis_dump(h5grp.create_group('sample'), self.ids(),
   3537                   self.metadata(), self.group_metadata(), 'csc', compression)

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
   3505                     # Create the dataset for the current category,
   3506                     # putting values in id order
-> 3507                     formatter[category](grp, category, md, compression)
   3508
   3509             # Create the group for the group metadata

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in vlen_list_of_str_formatter(grp, header, md, compression)
    272             continue
    273         value = np.asarray(m[header])
--> 274         data[i, :len(value)] = value
    275     # Change the None entries on data to empty strings ""
    276     data = np.where(data == np.array(None), "", data)

TypeError: len() of unsized object

This problem does not happen without the metadata added to the biom table. The metadata is added using: table.add_metadata(taxdict,axis='observation') where taxdict is of the form: taxdict[OBSID]={'taxonomy': 'unknown'}

note that to_json() works fine with this table, but later get the same error if try to convert this json file to hdf5. Attached is the json file test.txt

wasade commented 8 years ago

The expectation is that taxonomy is a list of str. The JSON format is much more flexible on metadata, and this is a known issue. Work on refactoring the HDF5 formatters and parsers is deferred until the Table migrates to skbio.

09:57:08 (daniel@sandbar):~/Downloads> t = load_table('test.txt')

09:57:21 (daniel@sandbar):~/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))}

09:57:31 (daniel@sandbar):~/Downloads> t.add_metadata(md, axis='observation')

09:57:39 (daniel@sandbar):~/Downloads> f = h5py.File('baz.txt', 'w')

09:57:46 (daniel@sandbar):~/Downloads> t.to_hdf5(f, 'asd')

09:57:52 (daniel@sandbar):~/Downloads> f.close()
amnona commented 8 years ago

Cool. That explains it :) Maybe worth updating the doc for the add_metadata() function to state that?

Thanks! Amnon

On Fri, Jan 29, 2016 at 9:58 AM, Daniel McDonald notifications@github.com wrote:

The expectation is that taxonomy is a list of str. The JSON format is much more flexible on metadata, and this is a known issue. Work on refactoring the HDF5 formatters and parsers is deferred until the Table migrates to skbio.

09:57:08 (daniel@sandbar):~/Downloads> t = load_table('test.txt') 09:57:21 (daniel@sandbar):~/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))} 09:57:31 (daniel@sandbar):~/Downloads> t.add_metadata(md, axis='observation') 09:57:39 (daniel@sandbar):~/Downloads> f = h5py.File('baz.txt', 'w') 09:57:46 (daniel@sandbar):~/Downloads> t.to_hdf5(f, 'asd') 09:57:52 (daniel@sandbar):~/Downloads> f.close()

— Reply to this email directly or view it on GitHub https://github.com/biocore/biom-format/issues/689#issuecomment-176887767 .

wasade commented 8 years ago

Sure, are you able to issue a PR?

On Fri, Jan 29, 2016 at 7:03 PM, amnona notifications@github.com wrote:

Cool. That explains it :) Maybe worth updating the doc for the add_metadata() function to state that?

Thanks! Amnon

On Fri, Jan 29, 2016 at 9:58 AM, Daniel McDonald <notifications@github.com

wrote:

The expectation is that taxonomy is a list of str. The JSON format is much more flexible on metadata, and this is a known issue. Work on refactoring the HDF5 formatters and parsers is deferred until the Table migrates to skbio.

09:57:08 (daniel@sandbar):~/Downloads> t = load_table('test.txt') 09:57:21 (daniel@sandbar):~/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))} 09:57:31 (daniel@sandbar):~/Downloads> t.add_metadata(md, axis='observation') 09:57:39 (daniel@sandbar):~/Downloads> f = h5py.File('baz.txt', 'w') 09:57:46 (daniel@sandbar):~/Downloads> t.to_hdf5(f, 'asd') 09:57:52 (daniel@sandbar):~/Downloads> f.close()

— Reply to this email directly or view it on GitHub < https://github.com/biocore/biom-format/issues/689#issuecomment-176887767> .

— Reply to this email directly or view it on GitHub https://github.com/biocore/biom-format/issues/689#issuecomment-176889438 .