chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
38 stars 24 forks source link

ontology updates w/ 4.0.0 #605

Closed jahilton closed 11 months ago

jahilton commented 1 year ago
jahilton commented 1 year ago

Private Collection: Human Neck Adipose Tissue Dataset: adipose from neck - all nuclei _donorid:CREEK003 should be _self_reportedethnicity:HANCESTRO:0005,HANCESTRO:0013 (currently 'multiethnic')

jahilton commented 1 year ago

Collection: 1ca90a2d-2943-483d-b678-b809bf464c30 All Datasets _donorid:H20.33.018 should be _self_reportedethnicity: HANCESTRO:0013,HANCESTRO:0014 _donorid:H20.33.034 should be _self_reportedethnicity: HANCESTRO:0005 _donorid:H21.33.037 should be _self_reportedethnicity: HANCESTRO:0005

All are currently 'admixed ancestry'/'multiethnic' (the latter two checked 'Other' in addition to 'White' but didn't offer further information)

jahilton commented 1 year ago

Collection: 6f6d381a-7701-4781-935c-db10d30de293 Dataset: 9f222629-9e39-47d0-b83f-e08d610c7479 _donorid: homosapiens_None_2023_None_sikkemalisa_002_d10_1101_2022_03_10_4837472020-3173-NC004 should be _self_reportedethnicity: HANCESTRO:0005,HANCESTRO:0008 _donorid: homosapiens_None_2023_None_sikkemalisa_002_d10_1101_2022_03_10_483747290B should be _self_reportedethnicity: unknown (submitter has "mixed" and "south american") _donorid: homosapiens_None_2023_None_sikkemalisa_002_d10_1101_2022_03_10_483747NP19 should be _self_reportedethnicity: unknown (submitter only has "mixed" with no further information)

Collection: 6f6d381a-7701-4781-935c-db10d30de293 Dataset: 066943a2-fdac-4b29-b348-40cede398e4e _donorid: homosapiens_None_2023_None_sikkemalisa_001_d10_1101_2022_03_10_4837472020-3173-NC004 should be _self_reportedethnicity: HANCESTRO:0005,HANCESTRO:0008 _donorid: homosapiens_None_2023_None_sikkemalisa_001_d10_1101_2022_03_10_483747290B should be _self_reportedethnicity: unknown (submitter has "mixed" and "south american")

jahilton commented 1 year ago

Collection: 7d7cabfd-1d1f-40af-96b7-26a0825a306d Only 1 Dataset _donorid in ['Rep_C_1012','Rep_C_1017','Rep_C_1033','Rep_C_1036','Rep_C_1037','Rep_C_1039','Rep_C_1050','Rep_C_1053','Rep_C_1055','Rep_C_1059','Rep_C_1060','Rep_C_1064','Rep_C_1076','Rep_C_1078','Rep_C_1094','Rep_C_1095','Rep_C_1107','Rep_C_1143','Rep_C_1151','Rep_C_1154','Rep_C_1161'] should be _self_reportedethnicity: HANCESTRO:0014 (submitter has "Other/Multiple Races" & "Hispanic/Latino") _donorid in ['Rep_C_1002','Rep_C_1003','Rep_C_1021','Rep_C_1051','Rep_C_1066','Rep_C_1072'] should be _self_reportedethnicity: unknown (submitter has "Other/Multiple Races" & "Non Hispanic/Latino")

jychien commented 1 year ago

Collection: 4d74781b-8186-4c9a-b659-ff4dc4601d91 Dataset: b07fb54c-d7ad-4995-8bb0-8f3d8611cabe _cell_type_ontology_termid: CL:0000234 (phagocyte) should be CL:0000113 (mononuclear phagocyte)

EMRutherford commented 1 year ago

Private Collection: Neuron type-specific effects of human aging and sex on DNA methylation and transcription Dataset: all (only one dataset) author_cell_type: L4-5IT_RORB_TSHZ2 should be CL:4030062 (L4/5 intratelencephalic projecting glutamatergic neuron) author_cell_type: L2-4IT_CUX2 should be CL:4030059 (L2/3 intratelencephalic projecting glutamatergic neuron) author_cell_type: L4-5IT_RORB_LRRK1 should be CL:4030062 (L4/5 intratelencephalic projecting glutamatergic neuron) author_cell_type: L4-5IT_RORB_ARHGAP15 should be CL:4030062 (L4/5 intratelencephalic projecting glutamatergic neuron) author_cell_type: L6IT_THEMIS_LINC00343 should be CL:4030065 (L6 intratelencephalic projecting glutamatergic neuron) author_cell_type: L6IT_THEMIS_CUX1 should be CL:4030065 (L6 intratelencephalic projecting glutamatergic neuron) author_cell_type: L3-5IT_RORB_PLCH1 should be CL:4030061 (L3 intratelencephalic projecting glutamatergic neuron)

brianraymor commented 1 year ago

@jahilton - clarifying - were these addressed after the 3.1.0 migration? If so, can this be closed?

jahilton commented 1 year ago

No, these are for the next migration (post-3.1.0 means whatever schema version is after 3.1.0, which we now know will be 4.0.0). I'll update the title

Jchaffer787 commented 11 months ago

Private Collection: A single-cell transcriptional timelapse of mouse embryonic development, from gastrula to pup

Dataset: Major cell Cluster: CNS neurons CL:0003001 (bistratified retinal ganglion cell) should be CL:4033053 (small bistratified retinal ganglion cell)

Dataset: Major cell Cluster: Epithelial cells CL:0000068 (duct epithelial cell) should be CL:4030066 (ureteric bud cell)

Dataset: Whole dataset: Raw counts only CL:0003001 (bistratified retinal ganglion cell) should be CL:4033053 (small bistratified retinal ganglion cell) CL:0000068 (duct epithelial cell) should be CL:4030066 (ureteric bud cell)

jahilton commented 11 months ago

Migration is complete. The updates that weren't made due because the terms weren't in the schema 4.0.0 ontologies have been migrated to the next ontology update issue #710

@Jchaffer787 your comment came after the migration had been set. FYI - CL:4033053 is accepted by current schema, but CL:4030066 is not. If you make the CL:4033053 revision, please update the comment in the new issue