biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
89 stars 95 forks source link

Unable to use biom without h5py installed #896

Closed peterjc closed 1 year ago

peterjc commented 1 year ago

Current behaviour having used conda/pip to install biom 2.1.14 and h5py 3.8.0 on macOS, and then deliberately removing h5py:

$ python -c "from biom.util import HAVE_H5PY; print(HAVE_H5PY)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/pc40583/opt/miniconda3/lib/python3.9/site-packages/biom/__init__.py", line 52, in <module>
    from .parse import parse_biom_table as parse_table, load_table
  File "/Users/pc40583/opt/miniconda3/lib/python3.9/site-packages/biom/parse.py", line 14, in <module>
    import h5py
ModuleNotFoundError: No module named 'h5py'

Expected behaviour:

Given the graceful failure code here https://github.com/biocore/biom-format/blob/2.1.14/biom/util.py#L24 I expected this to return False.

i.e. Should work but only be able to use JSON BIOM v1 files.

wasade commented 1 year ago

We made h5py a required dependency in #824 but it looks like there are lingering historical checks that we need to address. So I agree this is unexpected, however I think the correct course of action is to remove that check in util.py and make an explicit note in the changelog.

I would greatly prefer to retain h5py as a required rather than optional dependency as the vast majority of BIOM use is through format version 2.1. Is there a specific usecase where h5py is not viable?

peterjc commented 1 year ago

I thought it would be easy to export HDF5 by default, and fall back on JSON if the dependencies were missing. Without knowing much about the dependency stack that seemed like a plausible situation - especially on less mainstream platforms.

I agree if HDF5 is now considered a hard requirement, then that code in util.py could be simplified.

wasade commented 1 year ago

That certainly makes sense :)

Okay, so I'll line up addressing that code in util.py and anything depending on the associated variables.

To be fair, the IO mechanisms for BIOM are not streamlines in my opinion, but I haven't yet found time to do it. I just realized this wasn't codified as an issue, so I opened #897