chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 18 forks source link

embeddings functions do not support same values for `organisms` as main API #1205

Open ivirshup opened 1 week ago

ivirshup commented 1 week ago

Describe the bug

While functions like get_anndata can accept values for the organism argument like "Homo sapiens" or "homo_sapiens", functions in the embedding API only accept snake-case values like "homo_sapiens".

To Reproduce

import cellxgene_census, cellxgene_census.experimental

CENSUS_VERSION = "2023-12-15"
census = cellxgene_census.open_soma(census_version=CENSUS_VERSION)

Both of these work:

cellxgene_census.get_obs(census, "homo_sapiens", coords=[1, 2, 3])
cellxgene_census.get_obs(census, "Homo sapiens", coords=[1, 2, 3])

For the embeddings, this works:

cellxgene_census.experimental.get_embedding_metadata_by_name(
    census_version=CENSUS_VERSION,
    organism="homo_sapiens",
    embedding_name="scvi",
)

But this doesn't:

cellxgene_census.experimental.get_embedding_metadata_by_name(
    census_version=CENSUS_VERSION,
    organism="Homo sapiens",
    embedding_name="scvi",
)
File ~/github/cellxgene-census/api/python/cellxgene_census/src/cellxgene_census/experimental/_embedding.py:207, in get_embedding_metadata_by_name(embedding_name, organism, census_version, embedding_type)
    204         embeddings.append(obj)
    206 if len(embeddings) == 0:
--> 207     raise ValueError(f"No embeddings found for {embedding_name}, {organism}, {resolved_census_version}, {embedding_type}")
    209 return sorted(embeddings, key=lambda x: x["submission_date"])[-1]

ValueError: No embeddings found for scvi, Homo sapiens, 2023-12-15, obs_embedding

Expected behavior

The accepted values and behavior of the organism argument to be consistent across functions.

Environment

Provide a description of your system and the software versions.

``` ----- IPython 8.24.0 anndata 0.10.7 cellxgene_census 1.14.2.dev4+gcfca649.d20240612 pandas 2.2.2 session_info 1.0.0 tiledbsoma 1.12.0 ----- aiobotocore 2.13.0 aiohttp 3.9.5 aioitertools 0.11.0 aiosignal 1.3.1 asttokens NA attr 23.2.0 attrs 23.2.0 botocore 1.34.106 certifi 2024.02.02 cffi 1.16.0 charset_normalizer 3.3.2 cloudpickle 2.2.1 cython_runtime NA dask 2024.5.2 dateutil 2.9.0.post0 decorator 5.1.1 dill 0.3.8 executing 2.0.1 frozenlist 1.4.1 fsspec 2024.3.1 h5py 3.11.0 idna 3.7 importlib_metadata NA jedi 0.19.1 jinja2 3.1.4 jmespath 1.0.1 llvmlite 0.42.0 markupsafe 2.1.5 multidict 6.0.5 natsort 8.4.0 numba 0.59.1 numpy 1.26.4 packaging 24.0 parso 0.8.4 prompt_toolkit 3.0.45 psutil 5.9.8 pure_eval 0.2.2 pyarrow 15.0.2 pyarrow_hotfix NA pycparser 2.22 pygments 2.18.0 pytz 2024.1 requests 2.32.2 s3fs 2024.3.1 scipy 1.13.1 six 1.16.0 somacore 1.0.11 sparse 0.15.4 stack_data 0.6.3 tblib 1.7.0 tiledb 0.30.0 tlz 0.12.1 toolz 0.12.1 torch 2.2.2+cu121 torchgen NA tqdm 4.66.4 traitlets 5.14.3 typing_extensions NA urllib3 2.2.1 wcwidth 0.2.13 wrapt 1.16.0 xxhash NA yaml 6.0.1 yarl 1.9.4 zipp NA ----- Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Linux-6.8.0-1009-aws-x86_64-with-glibc2.39 ----- Session information updated at 2024-06-24 23:50 ```

Additional context

Some previous discussion: