chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

Update tabula muris brain datasets to better reflect differences #40

Closed pablo-gar closed 2 years ago

pablo-gar commented 3 years ago

In the Tabula Muris Senis collection, there are two different brain datasets with identical titles (stored in AnnData.uns['title']).

When the feature "Revision of public collections" is ready, these datasets should be updated to reflect their differences


Current dataset title: "Brain — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse"

Updated titles: "Brain myeloid cells — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse" https://cellxgene.cziscience.com/e/c08f8441-4a10-4748-872a-e70c0bcccdba.cxg/

"Brain non-myeloid cells — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse"
https://cellxgene.cziscience.com/e/66ff82b4-9380-469c-bc4b-cfa08eacd325.cxg/

jahilton commented 3 years ago

There are many more duplicated titles in that Collection. These all seem to have a 10x dataset and a Smart-seq2 dataset so those titles should be appended accordingly... All — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Lung — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Heart — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Liver — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Kidney — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Spleen — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Thymus — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Tongue — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Trachea — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Pancreas — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Bone marrow — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Limb muscle — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Skin of body — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Bladder lumen — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Mammary gland — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Large intestine — A single-cell transcriptomic atlas characterizes ageing tissues in the mouse

pablo-gar commented 3 years ago

We were relying on on the "assay' column to disambiguate them. Please feel free to change the titles if that makes sense to you. If you find any more duplicated data (like the marrow example) Please let me know and I'll send you the curated versions.

jahilton commented 3 years ago

OK, I had misinterpreted the initial task to be 'cxg should avoid duplicated titles'. If they're not a problem, then we'll leave them

brianraymor commented 3 years ago

I found that when I used this as an example for gene sets it was confusing to others even when I told them that assays disambiguated the datasets.

jahilton commented 3 years ago

I think it's reasonable, perhaps even prudent, to require unique dataset titles (either globally or within a Collection).

brianraymor commented 3 years ago

For the moment, we are relying on curators. If we schedule Curators want to rename datasets in a collection without uploading again in the future, then it would be simple to enforce in the portal (either globally or within a Collection).

jahilton commented 3 years ago

For the moment, we are relying on curators.

So you are currently anticipating non-redundant dataset titles? I definitely understand the difficulty in portal enforcement under the existing framework & happy to add that to our checks, and required updates to the migration. I'd recommend something like "Titles SHOULD be unique within a collection" be added here in the schema... https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/2.0.0/corpora_schema.md#title

brianraymor commented 3 years ago

I will add a STRONGLY RECOMMEND next time I touch the schema ...

jahilton commented 2 years ago

done during migration