chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 18 forks source link

Missing column in docs for cellxgene_census schema #1198

Closed emdann closed 1 week ago

emdann commented 2 weeks ago

Hi, I was trying to download cell-level metadata usingcellxgene_census.get_obs. I usually refer to this for column names.

I tried filtering by tissue_type and got a missing column error.

cell_metadata = census["census_data"]["homo_sapiens"].obs.read(
    value_filter = "tissue_type == 'tissue'",
    column_names = ["assay", "tissue", "tissue_general", "suspension_type", "disease", 'dataset_id', 'is_primary_data', 
                    'donor_id', 'development_stage_ontology_term_id',
                   'cell_type_ontology_term_id','cell_type', 'raw_sum', 'nnz']
)
---------------------------------------------------------------------------
SOMAError                                 Traceback (most recent call last)
Cell In [15], line 1
----> 1 cell_metadata = census["census_data"]["homo_sapiens"].obs.read(
      2     value_filter = "tissue_type == 'tissue'",
      3     column_names = ["assay", "tissue", "tissue_general", "suspension_type", "disease", 'dataset_id', 'is_primary_data', 
      4                     'donor_id', 'development_stage_ontology_term_id',
      5                    'cell_type_ontology_term_id','cell_type', 'raw_sum', 'nnz']
      6 )

File ~/my-conda-envs/patho-signatures-2/lib/python3.9/site-packages/tiledbsoma/_dataframe.py:409, in DataFrame.read(***failed resolving arguments***)
    399 sr = clib.SOMADataFrame.open(
    400     uri=handle.uri,
    401     mode=clib.OpenMode.read,
   (...)
    405     timestamp=handle.timestamp and (0, handle.timestamp),
    406 )
    408 if value_filter is not None:
--> 409     sr.set_condition(QueryCondition(value_filter), handle.schema)
    411 self._set_reader_coords(sr, coords)
    413 # # TODO: batch_size

SOMAError: SOMAError: 'Column tissue_type does not exist in schema'

I just upgraded to v1.14.1

Am I using the wrong docs/schema?

pablo-gar commented 2 weeks ago

@emdann thanks for reaching out.

tissue_type was added relatively recent to Census, and after our latest LTS data release. If you are opening the Census with the defaults then you are opening the LTS version of Census which does not have tissue_type.

If you open the latest non-LTS version of Census you will be able to see the field. However only cells with tissue_type of values "tissue" are included in Census, so at the moment the information is not very useful.

See this reproducible example

import cellxgene_census
census = cellxgene_census.open_soma(census_version="latest")
obs = cellxgene_census.get_obs(census, "homo_sapiens", column_names=["tissue_type"])