chanzuckerberg / single-cell

A collection of documents that reflect various design decisions that have been made for the cellxgene project.
MIT License
4 stars 2 forks source link

feat(schema4): update WMG pipeline to exclude non-tissue "tissue_type" cells from the generated cubes #526

Closed atarashansky closed 1 year ago

atarashansky commented 1 year ago

Data portal will be dropping (organoid) or (cell culture) from the names of cell culture and organoid tissues with schema 4. We'll instead need to use a new field tissue_type. In addition to making this change to the WMG pipeline, we'll also need to see if there's any additional downstream dependencies that need to be updated.

In addition to excluding organoids and cell cultures, there is also a product ask to exclude "system" tissues such as "immune system" or "respiratory system". These cells are things that we tried to map into more specific organs but couldn't, and fell back on a less granular term (ex: "respiratory system" instead of "lung"). We've decided that it's not useful to have these in Gene Expression, so we'll just be removing these from WMG as well. Since there isn't a "system" tissue type, we'll just have to exclude a list of system UBERON terms.

See the product requirements from this ticket. The product ask to remove systems from WMG, though not related to schema 4, are being rolled into this ticket since these changes all require updating the WMG pipeline and would benefit from being implemented and tested at the same time.

Slack context thread

nayib-jose-gloria commented 1 year ago

@atarashansky not urgent--does WMG already do this? As in, is this ticket a new feature for 4.0.0 or translating an old feature to use the new tissue_type field?

joyceyan commented 1 year ago

@nayib-jose-gloria WMG doesn't already do this. This would be a new feature that I believe has been requested as early as August of this year, but the schema 4 tissue_type field makes it a bit easier to implement.