biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
89 stars 95 forks source link

Clarification about ID type in the docs for Table.update_ids #879

Closed fedarko closed 1 year ago

fedarko commented 2 years ago

I'm not sure if non-string IDs were ever officially supported for BIOM tables, but some of the Empress test code (these lines, in particular) involved setting IDs to integers. It looks like this commit broke this behavior, and by extension this particular Empress test -- it triggered the following error:

ERROR: test_match_inputs_no_tips_in_table (python.test_tools.TestTools)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/empress/empress/tests/python/test_tools.py", line 143, in test_match_inputs_no_tips_in_table
    bad_table.update_ids({i: idx for idx, i in
  File "/usr/share/miniconda/envs/empress/lib/python3.9/site-packages/biom/table.py", line 1405, in update_ids
    str_dtype = 'U%d' % max([len(v) for v in id_map.values()])
  File "/usr/share/miniconda/envs/empress/lib/python3.9/site-packages/biom/table.py", line 1405, in <listcomp>
    str_dtype = 'U%d' % max([len(v) for v in id_map.values()])
TypeError: object of type 'int' has no len()

This PR documents explicitly that both keys and values for the id_map parameter of Table.update_ids() should be strings. I haven't added any validation code that checks this (I'm not sure if the added runtime incurred by such a check would be desirable, and this seems like it'd be a pretty weird thing for users to try to do normally...?), but hopefully this clarifies things for future devs who encounter this problem on updating to newer versions of biom-format.

This PR also includes two small grammar fixes.

Thanks!

fedarko commented 2 years ago

It looks like the build is failing for this -- since this PR only contains docs fixes, these failures are likely due to dependencies shifting. The build seems to be failing when attempting to install anndata:

Collecting anndata
  Downloading anndata-0.7.8-py3-none-any.whl (91 kB)
Requirement already satisfied: numpy>=1.16.5 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from anndata) (1.19.5)
Requirement already satisfied: packaging>=20 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from anndata) (21.3)
Requirement already satisfied: h5py in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from anndata) (2.10.0)
Requirement already satisfied: scipy>1.4 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from anndata) (1.5.3)
Requirement already satisfied: pandas>=1.1.1 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from anndata) (1.1.5)
Collecting natsort
  Downloading natsort-8.1.0-py3-none-any.whl (37 kB)
Requirement already satisfied: importlib_metadata>=0.7; python_version < "3.8" in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from anndata) (3.10.1)
Collecting xlrd<2.0
  Downloading xlrd-1.2.0-py2.py3-none-any.whl (103 kB)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from packaging>=20->anndata) (3.0.8)
Requirement already satisfied: six in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from h5py->anndata) (1.16.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from pandas>=1.1.1->anndata) (2.8.2)
Requirement already satisfied: pytz>=2017.2 in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from pandas>=1.1.1->anndata) (2022.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/share/miniconda/envs/env_name/lib/python3.6/site-packages (from importlib_metadata>=0.7; python_version < "3.8"->anndata) (4.2.0)
ERROR: Package 'typing-extensions' requires a different Python: 3.6.15 not in '>=3.7'
Error: Process completed with exit code 1.

... I'm not sure how to fix this! It seems like anndata (or at least its dependencies) might no longer be compatible with Python 3.6?

wasade commented 2 years ago

That is unusual, would it be possible to add "anndata" to the conda install in gh actions? It's present in the bioconda channel. I wonder if pip and conda are fighting

fedarko commented 2 years ago

@wasade awesome, installing anndata via conda instead of via pip fixed the build! For now, the CI installs anndata from conda-forge instead of bioconda -- anndata is available on both of these channels (conda-forge, bioconda), but it looks like only the conda-forge version is actively maintained.

I also went ahead and updated the README a bit, similar to the changes made for the iow PR.

peterjc commented 1 year ago

Confirming anndata was at 0.6.22.post1 when it was dropped from bioconda in https://github.com/bioconda/bioconda-recipes/pull/17783 (bulk change in Oct 2019) so using conda-forge makes sense. That package lives at https://github.com/conda-forge/anndata-feedstock/

wasade commented 1 year ago

Thanks for the bump on this. Is this PR good to go? I was out all summer on leave and missed the follow up here

fedarko commented 1 year ago

The PR should be good to go. Thanks!

wasade commented 1 year ago

Thanks!

fedarko commented 1 year ago

Just as a heads up, it looks like the Python 3.6 build began failing in recent months (although the Python 3.10 build is fine). As far as I can tell, this is due to anndata 0.8.0 bumping its minimum Python version to 3.7 (source)—biom doesn't set a pin on the anndata version used.

I guess the two obvious ways to handle this issue would be 1) dropping biom's support for Python 3.6 also, or 2) pinning its anndata dependency to be < 0.8.0.

wasade commented 1 year ago

I don't know the implications of pinning anndata. Would that be bad?

fedarko commented 1 year ago

It seems to be an explicitly optional dependency that's just used in the to_anndata() function, so I don't think pinning anndata would be a big deal. But I guess it could cause problems for users down the line if, for example, later versions of anndata lose backwards compatibility with the outputs of anndata < 0.8.0.

Looking through the code some more, a better fix might just be removing the anndata installation from the Python 3.6 build—the test suite should automatically detect if it isn't installed and skip the relevant tests. If we do this (while presumably updating the docs somewhere to mention that using anndata >= 0.8.0 implies a minimum Python version of 3.7), then we get the best of both worlds: there's no anndata pin required, and we can still support Python 3.6 (albeit without anndata >= 0.8.0).

Ultimately not my call, though :)

peterjc commented 1 year ago

Python 3.6 is EOL, so at some point you'll be dropping it anyway: https://peps.python.org/pep-0494/

wasade commented 1 year ago

We can account for that on the next release, but I do not have a timeline at the moment for that