chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

Inconsistent use of CL Classification hierarchy to drive search #5784

Closed dosumis closed 1 year ago

dosumis commented 1 year ago

Describe the bug

Inconsistent use of CL Classification hierarchy to drive search (I'm assuming this is a bug, although I guess it could be a feature)

To Reproduce Here's the CL hierarchy under kidney epithelial cell (courtesy of CellGuide)

image

Here is a list of collections found by search with 'kidney epithelial cell'

image

The results do not include collections with datasets annotated with subclasses of 'kidney epithelial cell' e.g.

e.g. https://cellxgene.cziscience.com/collections/bcb61471-2a44-4d00-a0af-ff085512674c

image image

Expected behavior

Search with 'kidney epithelial cell' should return all datasets with annotations to types of 'kidney epithelial cell', e.g. the Lake dataset above.

brianraymor commented 1 year ago

The current behavior is by design.

dosumis commented 1 year ago

If there is a manually constructed hierarchy, is there somewhere I can access it? I'm curious to compare where it differs with CL (we have an automated process for this). I'd also be very interested in any record of the rationale for the current structure. Thanks.

brianraymor commented 1 year ago

See Cell Type Constants in https://github.com/chanzuckerberg/single-cell-data-portal/blob/2505cefa0408cb7b0efd5df75649fbd477fc8fa4/scripts/compute_tissue_and_cell_type_mappings.ipynb. The constants were hand curated by @jahilton and @pablo-gar. Much of the original conversation was either face-to-face or on slack.

brianraymor commented 1 year ago

@dosumis - did you have further comments or questions? Otherwise, I will close this issue.

brianraymor commented 1 year ago

No response. Closing.