biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
90 stars 95 forks source link

uninformative error when sorting a filtered table that is empty #620

Closed jairideout closed 9 years ago

jairideout commented 9 years ago

When sorting a table that has been filtered so that it is empty (0x0), an uninformative error message is raised. This only seems to happen if the table is filtered to be empty; creating an empty Table outright and sorting it works.

I know this use-case sounds silly (why would you want to sort an empty table?), but this error is raised in QIIME's observation_metadata_correlation.py script if the user provides a metadata category that doesn't have any numeric values, resulting in a filtered table that is empty. A collaborator ran into this error and it wasn't obvious what went wrong and why.

Example:

In [1]: import numpy as np

In [2]: from biom import Table

In [3]: t = Table(np.asarray([[1, 2, 3], [4, 5, 6]]), ['a', 'b'], ['c', 'd', 'e'])

In [4]: t.filter(ids_to_keep=[], axis='sample')
Out[4]: 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)

In [5]: t.sort(sort_f = lambda _: [], axis='sample')
---------------------------------------------------------------------------
TableException                            Traceback (most recent call last)
<ipython-input-5-ebb9d7c83d49> in <module>()
----> 1 t.sort(sort_f = lambda _: [], axis='sample')

/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/table.pyc in sort(self, sort_f, axis)
   1753         O1  0.0 1.0 3.0
   1754         """
-> 1755         return self.sort_order(sort_f(self.ids(axis=axis)), axis=axis)
   1756
   1757     def filter(self, ids_to_keep, axis='sample', invert=False, inplace=True):

/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/table.pyc in sort_order(self, order, axis)
   1672                                   self.ids(axis='observation')[:], order[:],
   1673                                   self.metadata(axis='observation'), md,
-> 1674                                   self.table_id, self.type)
   1675         elif axis == 'observation':
   1676             for id_ in order:

/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/table.pyc in __init__(self, data, observation_ids, sample_ids, observation_metadata, sample_metadata, table_id, type, create_date, generated_by, observation_group_metadata, sample_group_metadata, **kwargs)
    258         self._observation_group_metadata = observation_group_metadata
    259
--> 260         errcheck(self)
    261
    262         # These will be set by _index_ids()

/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/err.pyc in errcheck(table, *errtypes)
    471     ret = __errprof.test(table, *errtypes)
    472     if isinstance(ret, Exception):
--> 473         raise ret
    474     else:
    475         return ret

TableException: Number of observation IDs differs from matrix size!
wasade commented 9 years ago

That is weird and suggests some state within the filtered table is not in sync

On Fri, Mar 27, 2015 at 10:03 AM, Jai Ram Rideout notifications@github.com wrote:

When sorting a table that has been filtered so that it is empty (0x0), an uninformative error message is raised. This only seems to happen if the table is filtered to be empty; creating an empty Table outright and sorting it works.

I know this use-case sounds silly (why would you want to sort an empty table?), but this error is raised in QIIME's observation_metadata_correlation.py script if the user provides a metadata category that doesn't have any numeric values, resulting in a filtered table that is empty. A collaborator ran into this error and it wasn't obvious what went wrong and why.

Example:

In [1]: import numpy as np

In [2]: from biom import Table

In [3]: t = Table(np.asarray([[1, 2, 3], [4, 5, 6]]), ['a', 'b'], ['c', 'd', 'e'])

In [4]: t.filter(ids_to_keep=[], axis='sample') Out[4]: 0 x 0 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)

In [5]: t.sort(sortf = lambda : [], axis='sample')--------------------------------------------------------------------------- TableException Traceback (most recent call last) in ()----> 1 t.sort(sortf = lambda : [], axis='sample') /Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/table.pyc in sort(self, sort_f, axis) 1753 O1 0.0 1.0 3.0 1754 """-> 1755 return self.sort_order(sort_f(self.ids(axis=axis)), axis=axis) 1756 1757 def filter(self, ids_to_keep, axis='sample', invert=False, inplace=True):/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/table.pyc in sort_order(self, order, axis) 1672 self.ids(axis='observation')[:], order[:], 1673 self.metadata(axis='observation'), md,-> 1674 self.tableid, self.type) 1675 elif axis == 'observation': 1676 for id in order:/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/table.pyc in init(self, data, observation_ids, sample_ids, observation_metadata, sample_metadata, table_id, type, create_date, generated_by, observation_group_metadata, sample_group_metadata, *kwargs) 258 self._observation_group_metadata = observation_group_metadata 259--> 260 errcheck(self) 261 262 # These will be set by _index_ids()/Users/jairideout/.virtualenvs/qiime/lib/python2.7/site-packages/biom/err.pyc in errcheck(table, errtypes) 471 ret = __errprof.test(table, *errtypes) 472 if isinstance(ret, Exception):--> 473 raise ret 474 else: 475 return retTableException: Number of observation IDs differs from matrix size!

— Reply to this email directly or view it on GitHub https://github.com/biocore/biom-format/issues/620.

Jorge-C commented 9 years ago

To solve it, we need to either say that the shape of the table is 0 x len(observation ids), or remove observation ids when a table is fully filtered out. What would be preferrable? Relevant code here.

wasade commented 9 years ago

I kind of like updating the empty table code to be smarter?

On Fri, Mar 27, 2015 at 11:28 AM, Jorge Cañardo Alastuey < notifications@github.com> wrote:

To solve it, we need to either say that the shape of the table is 0 x len(observation ids), or remove observation ids when a table is fully filtered out. What would be preferrable? Relevant code here https://github.com/biocore/biom-format/blob/master/biom/_filter.pyx#L87-L88 .

— Reply to this email directly or view it on GitHub https://github.com/biocore/biom-format/issues/620#issuecomment-87022226.

Jorge-C commented 9 years ago

What do you mean?

My question was whether a totally filtered out table should have shape 0, 0 (current behaviour) or 0,n (would immediately solve this bug). I have a slight prefererence the second option because it seems more consistent:

Filter all but one -> shape (1, n)
Filter all         -> shape (0, n)

No matter the option, table.is_empty() would keep returning True.

Other issues to consider?

wasade commented 9 years ago

oh, i see. agree, second option is more consistent

On Fri, Mar 27, 2015 at 11:40 AM, Jorge Cañardo Alastuey < notifications@github.com> wrote:

What do you mean?

My question was whether a totally filtered out table should have shape 0, 0 (current behaviour) or 0,n (would immediately solve this bug). I have a slight prefererence the second option because it seems more consistent:

Filter all but one -> shape (1, n) Filter all -> shape (0, n)

No matter the option, table.is_empty() would keep returning True.

Other issues to consider?

— Reply to this email directly or view it on GitHub https://github.com/biocore/biom-format/issues/620#issuecomment-87027446.

jairideout commented 9 years ago

Thanks for the quick fix @Jorge-C and @wasade!