chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

bug(filter): set of orphan tissues is stale #6227

Closed MillenniumFalconMechanic closed 6 months ago

MillenniumFalconMechanic commented 11 months ago

Describe the Bug

The set of orphan tissues (orphan_tissues) defined in compute_tissue_and_cell_type_mappings is not current. The orphan tissue set is required to correctly build the tissue graph of the corpus which is then used to generate the ancestor and descendant mappings that drive the FE filter.

Note: the orphan_cell_types set is most likely also stale but further investigation is required to confirm this.

To Reproduce

Run compute_orphan_tissues.

Expected Behavior

The set of tissues identified as orphans matches the set in compute_tissue_and_cell_type_mappings:

"UBERON_0001013",  # adipose tissue
"UBERON_0009472",  # axilla
"UBERON_0018707",  # bladder organ
"UBERON_0000310",  # breast
"UBERON_0001348",  # brown adipose
"UBERON_0007106",  # chorionic villus
"UBERON_0000030",  # lamina propria
"UBERON_0015143",  # mesenteric fat pad
"UBERON_0000344",  # mucosa
"UBERON_0003688",  # omentum
"UBERON_0001264",  # pancreas
"UBERON_0000175",  # pleural effusion
"UBERON_0000403",  # scalp
"UBERON_0001836",  # saliva
"UBERON_0001416",  # skin of abdomen
"UBERON_0002097",  # skin of body
"UBERON_0001868",  # skin of chest
"UBERON_0001511",  # skin of leg
"UBERON_0002190",  # subcutaneous adipose tissue
"UBERON_0002100",  # trunk
"UBERON_0035328",  # upper outer quadrant of breast
"UBERON_0001040",  # yolk sac
"UBERON_0000014",  # zone of skin

Actual Behavior

The following tissues were identified as orphans:

"UBERON_0000916", # abdomen
"UBERON_0003697", # abdominal wall
"UBERON_0001013", # adipose tissue
"UBERON_0007795", # ascitic fluid
"UBERON_0009472", # axilla
"UBERON_0018707", # bladder organ
"UBERON_0000310", # breast
"UBERON_0001348", # brown adipose tissue
"UBERON_0002067", # dermis
"UBERON_0000016", # endocrine pancreas
"UBERON_0007650", # esophagogastric junction
"UBERON_0000030", # lamina propria
"UBERON_0015143", # mesenteric fat pad
"UBERON_0000344", # mucosa
"UBERON_0003688", # omentum
"UBERON_0001264", # pancreas
"UBERON_0035210", # paracolic gutter
"UBERON_0001366", # parietal peritoneum
"UBERON_0005406", # perirenal fat
"UBERON_0002358", # peritoneum
"UBERON_0000175", # pleural effusion
"UBERON_0001836", # saliva
"UBERON_0001416", # skin of abdomen
"UBERON_0002097", # skin of body
"UBERON_0001868", # skin of chest
"UBERON_0001511", # skin of leg
"UBERON_0001003", # skin epidermis
"UBERON_8300000", # skin of scalp
"UBERON_0001085", # skin of trunk
"UBERON_0014455", # subcutaneous abdominal adipose tissue
"UBERON_0002190", # subcutaneous adipose tissue
"UBERON_0035328", # upper outer quadrant of breast
"UBERON_0014454", # visceral abdominal adipose tissue
"UBERON_0001040", # yolk sac
"UBERON_0000014", # zone of skin

Additional context

Going forward, the compute_orphan_tissues could be incorporated into compute_tissue_and_cell_type_mappings to ensure orphan_tissues is always current.

nayib-jose-gloria commented 6 months ago

Addressed as part of integration with CellXGene-Ontology-Guide, which will dynamically determine "orphan" types in the corpus and include them in the ancestry / descendant mappings, which will also be pulled dynamically by the backend. The compute scripts and hardcoded orphan lists will be deprecated