GSE121638 - Mapping the immune environment in clear cell renal carcinoma by single-cell genomics (ImmuneRenalCarcinoma)

Wkt8 commented 3 years ago

Primary Wrangler: Wei Secondary Wrangler: Enrique Associated files: Google Drive: https://drive.google.com/drive/folders/1gTWohslenXt_hKb7J7PMBUQrKhWHcDEx?usp=sharing Project: https://contribute.data.humancellatlas.org/projects/detail?uuid=955dfc2c-a8c6-4d04-aa4d-907610545d11

Published study links Paper: https://www.nature.com/articles/s42003-020-01625-6 Accessioned data: GSE121638

Key Events

[x] convert published metadata to HCA spreadsheet
[x] manually curate dataset to meet HCA metadata standard
[x] collect any matrix and cell-type annotation files
[x] Upload sheet to validate metadata
[x] Check linking using ingest graph validator
[x] Transfer raw files to ingest to validate data files
[x] Ask the Secondary Wrangler for an end-to-end review of the project. Ask the Expertise Wrangler to review specific tabs if needed
[x] Submit dataset to Production
[ ] Convert project data to SCEA format following the SCEA conversion SOP if appropriate

Wkt8 commented 3 years ago

Run in ingest-graph-validator and uploaded the spreadsheet to staging: https://staging.contribute.data.humancellatlas.org/submissions/detail?id=607edf745ba51701d5bac1a0&project=6c86381e-3e3e-45dc-a51c-138e74130949

Spreadsheet is also in the google drive.

Waiting for secondary review. Note to whoever secondary reviews this that I believe the cell suspensions in GEO have been modelled incorrectly, and I've modelled it following the protocols in the paper.

The key difference is that in GEO, there are multiple samples (T cell from renal cancer tissue, CD45+ cells from renal cancer tissue) which I believe are actually libraries. This is because the protocols followed do not have a specific step to sort for T cells, apart from the library preparation protocl (using the 10X VDJ T cell enrichment kit).

ESapenaVentura commented 3 years ago

Hi @Wkt8 ! I have reviewed the dataset and I have a couple of notes:

Project - Contributors

Corrected institute/lab/address/country fields

Project - Funders

Corrected small typo in first grant (HHS ) to HHS)

Donor

Alive at collection - I’d Argue they were very much alive at the time of collection (I don’t know if you can donate blood after death or if it coagulates fast?) but It’s not explicit so it’s probably better to keep it unknown
Diseases - Added ontology/ontology label and corrected text for diseases
Developmental stage - Added missing ontology/Ontology label

Specimen

Blood specimens - wrong organ (kidney instead of blood)
Tumor specimens - Kidney was capitalised

Enrichment protocol

Blood enrichment - Corrected typo in description (started with space)

Cell suspension

Selected cell types - Based on the FACS enrichment protocol, it seems that they enriched for leukocytes and myeloid cells, so I have added them

Sequencing protocol Ontologies - I have added the ontology/ontology label for instrument and sequencing method

Sequence file

I have added the ontology/ontology label for content

Schemas tab

I have deleted the schemas tab (it was pointing to the staging schemas!!)

Overall the experiment design LGTM, I think you did a great job modelling VDJ! I have uploaded the updated spreadsheet with my corrections to the folder.

Please check further for possible missing ontologies, I have triple checked the fields but something might have escaped my eyes. This type of missing info won't cause trouble in ingest (since ontology and ontology_label are not required) but may cause problems downstream!

Wkt8 commented 3 years ago

Thanks very much @ESapenaVentura!! Will check further for the ontologies.

ofanobilbao commented 3 years ago

@Wkt8 moved to Finished in this board, as it looks as done from DCP perspective. Amend if I did not get it right. Thanks!

ami-day commented 2 years ago

I have pre-converted the MAGE-TAB files and put them here: https://drive.google.com/drive/folders/1n96Q3Ftws3h2ZxmqJr3zpxSCY_VtWTqF They require checking and manual curation.

I assigned them with E-HCAD-53.

The files are missing the 10X TCR samples and data. I am not yet sure if the technology is eligible for SCEA, I need to ask them or find an example on the SCEA portal. Either way, I think the dataset would need to be split by technology type so new files would need to be generated for the TCR data if it is eligible.

ami-day commented 1 year ago

This has already been started by Wei, SCEA Gitlab branch id: E-HCAD-44 https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/225

ami-day commented 1 year ago

Made corrections to E-HCAD-44 in Gitlab. Waiting for Silvie's review.

ami-day commented 1 year ago

Handed over to SCEA team (Gitlab) - review required.

ebi-ait / hca-ebi-wrangler-central

GSE121638 - Mapping the immune environment in clear cell renal carcinoma by single-cell genomics (ImmuneRenalCarcinoma) #305