chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
84 stars 22 forks source link

Specify the Census search space with an `obs_value_filter` parameter in `find_nearest_obs` #1305

Open pablo-gar opened 3 days ago

pablo-gar commented 3 days ago

Description

Currently find_nearest_obs finds the closest Census cells to user's data. It would be a great addition to have the ability for the search to be constrained to a subset of user-defined Census cells instead of all Census cells.

To provide consistency with other census API, a simple parameter obs_value_filter (as used in the get_anndataAPI) can be added to limit the search to Census cells meeting the filter criteria.

Context

The find_nearest_obs functionality is great and working as intended, however many times I can be interested on searching the most similar cells against a specific subset of Census defined by a biological context, specially when I know my query cells are from the same/similar biological context.

Impact

This limits the ability to utilize the find_nearest_obs to its full extent in its current form.

Ideal behavior

A parameter obs_value_filter in find_nearest_obs to limit the search to Census cells meeting the filter criteria.

ivirshup commented 3 days ago

Hmm, will have to check with @cathystoli to see if we're still accepting feature requests from you 😉


I think this makes a ton of sense as a feature. I've noticed that we probably want to filter out all is_primary==False cells from queries, since you just end up getting a bunch of cells with the exact same embedding.

I've consulted with the TileDB folks and I think this should be quite doable.

cc: @mlin