chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
84 stars 22 forks source link

Add include_cell_type_descendants convenience parameter to get_anndata() #1029

Closed hthomas-czi closed 5 months ago

hthomas-czi commented 8 months ago

Add include_cell_type_descendants parameter to get_anndata().

Proposal

adata = cellxgene_census.get_anndata(
    census = census,
    organism = "Homo sapiens",
    feature_ids: ['ENSG00000161798', 'ENSG00000188229'],
    cell_type = ['microglial cell', 'neuron'],
    include_cell_type_descendants = True,
    column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]},
)

This would use the get_term_descendants function in cellxgene-ontology-guide without requiring the user to use an additional API.

User Quote

I’d like a toolkit to easily navigate descendants and ascendants of cell types and tissues

pablo-gar commented 8 months ago

@hthomas-czi and @MaximilianLombardo we probably need a bit more details about the type of queries they'd be interested in doing.

Adding @brianraymor for visibility

brianraymor commented 8 months ago

@hthomas-czi and @MaximilianLombardo we probably need a bit more details about the type of queries they'd be interested in doing.

Yes please. Unfortunately, the title reads like a request for a general purpose ontology service (OLS4) or toolkit (OAK) which do exist outside of CELLxGENE Discover.

We're in discussion about our custom curated lists as part of the ontology service.

hthomas-czi commented 7 months ago

@pablo-gar

@MaximilianLombardo should add more context, but here is my interpretation. It's not intended to be a fully-featured ontology service API, because those exist already as Brian mentioned. It's providing minimal helper functions that provide much of the value of a third-party ontology service API without needing to use a totally different API.

For example, adding an optional Boolean argument to get_anndata that would return the descendants of the cell type specified (e.g., T cell), rather than only that one cell type. This particular example makes more sense in the context of Make convenience methods like get_anndata() have expressive arguments.

hthomas-czi commented 7 months ago

@MaximilianLombardo To follow-up with requester

MaximilianLombardo commented 7 months ago

followed up with the user, and here is the specific example they gave to illustrate the use case:

The main issue I had to deal with was inconsistencies between the level of granularity used for annotations in different studies. For example, one lung study might have annotated “fibroblasts” and another “fibroblasts of the lung”. If I am analysing these cells together I would ideally want both groups to be considered the same type. The solution is to do some sort of roll-up of fine annotations to a common ancestor. I am not aware of any implementation of this in the Census API. I have done this with a custom implementation (something looking for matches to the 3 closest ancestors in the ontology tree amongst other cell type labels), but it’s not perfect and I think it would be very useful to have a robust function to merge close terms.

hthomas-czi commented 7 months ago

Thanks, @MaximilianLombardo!

@pablo-gar This is exactly what I had in mind. I believe we could meet this user's need by:

  1. Surfacing our cell type and tissue roll-ups via the ontology service API, per Brian's comment "We're in discussion about our custom curated lists as part of the ontology service."
  2. Providing helper functions within Census API that use those ontology service APIs under-the-hood
pablo-gar commented 5 months ago

Let's start with some notebooks that showcase how to use the ontology service API along the Census API to enable this very issue