chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
78 stars 20 forks source link

Make convenience methods like get_anndata() have expressive arguments #1034

Open hthomas-czi opened 6 months ago

hthomas-czi commented 6 months ago

Should the API use types? Particularly to use in method arguments?

Advantages include:

This design would arguably be more "Pythonic", in particular:

Current

adata = cellxgene_census.get_anndata(
    census = census,
    organism = "Homo sapiens",
    var_value_filter = "feature_id in ['ENSG00000161798', 'ENSG00000188229']",
    obs_value_filter = "sex == 'female' and cell_type in ['microglial cell', 'neuron']",
    column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]},
)

Proposed

class Sex(Enum):
    FEMALE = "PATO_0000383"
    MALE = "PATO_0000384"
    UNKNOWN = "unknown"

adata = cellxgene_census.get_anndata(
    census = census,
    organism = "Homo sapiens",
    feature_ids: ['ENSG00000161798', 'ENSG00000188229'],
    sex = Sex.FEMALE,
    cell_type = ['microglial cell', 'neuron'],
    column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]},
)

Note: obs_value_filter arguments would be optional, so users would only need to include their obs values of interest.

Questions

ivirshup commented 4 months ago

For a recent project (docs), I added some logic for specifying filters so you can do things like:

ensdb.genes(
    filter=gf.filters.GeneBioTypeFilter("lncRNA")
    & gf.filters.GeneRangesFilter("1:10000-20000")
)

We've defined &, |, ~ (and, or, not) to do more complex combinations.

How much do you want to support custom data collections? E.g. if I have my own local census, which may have extra fields. Is that meant to be supported here? If that's supported it would be nice to be flexible about what columns are named, which could be difficult if columns to filter on are specified by keyword arguments.