HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

Review of GEO datasets GSE114156 and GSE109564 metadata #1214

Closed ami-day closed 4 years ago

ami-day commented 4 years ago

Hi, I completed the metadata fields for GEO datasets GSE114156 and GSE109564 which are both associated with the following publication by Humphreys et al.: "Single-Cell Transcriptomics of a Human Kidney Allograft Biopsy Specimen Defines a Diverse Inflammatory Response".

It would be great if this could be reviewed? @mshadbolt @zperova @ESapenaVentura. Here is the file location: https://drive.google.com/drive/folders/118kh4wiHmn4Oz9n1-WZueaxm-8XuCMkA.

There was already a filled-in sheet for GSE109564 in finished projects, so I copied that info. over into the combined sheet.

Originally posted by @ami-day in https://github.com/HumanCellAtlas/metadata-schema/issues/1210#issuecomment-578778628

ami-day commented 4 years ago

I didn't add a milestone, I guess we can discuss in our next stand-up tomorrow

ESapenaVentura commented 4 years ago

Hi @ami-day, I have reviewed the spreadsheet and I have the following comments:

General

Project

Project - Contributors

Project - Publications

Project - Funding source(s)

Specimen from organism

Cell suspension

Sequence files

Dissociation protocol

Library preparation protocol

Supplementary file

Happy to go through any doubt you have tomorrow :)

ami-day commented 4 years ago

Hi @ESapenaVentura,

I have finished making all the review changes we discussed, and your 'get ontology' script was super helpful.

Would it be possible to do a final review on the updated version (same file name and location)?

@mshadbolt and @zperova: Enrique and I were unsure about the end bias and tag bias options in the 'Library Prep Protocol' tab and the 'Sequencing protocol' tab; it would be great to know your thoughts on this.

The completed metadata sheet is located here: https://drive.google.com/drive/folders/1sA4mDAzvAkCAv8e8LYZPW7qkpT_4pRo8/COMPLETED Humphreys et al - Single-Cell Transcriptomics of a Human Kidney Allograft Biopsy Specimen.xlsx

Thank you

ESapenaVentura commented 4 years ago

Tested the spreadsheet in staging and there are no validation errors.

A couple of notes, though:

Project short name - Needs to be “computer-readable” (No spaces, no special chars)

21 year-old donor: There are 2 diseases in text but only one in ontology/ontology label. This won't fail validation but will result in a length 2 array with ontology only for the first item. Same with specimen from organism derived from this donor. Example here: Screenshot 2020-03-02 at 10 20 50

Collection protocols: Looks like both collection protocols are the same but just applied to different donors?

Selected cell types: There are 5 types of cells listed in text but only 1 in ontology/label

Sequence files:

SRR6506830_2.fastq.gz
SRR6506831_1.fastq.gz
SRR6506831_2.fastq.gz
SRR6506832_1.fastq.gz
SRR6506832_2.fastq.gz
SRR6506833_1.fastq.gz
SRR6506833_2.fastq.gz

Should have the same process_id (They all come from tube 4)

Library_prep protocol: Input nucleic acid molecule should be "polyA RNA extract” instead of mRNA. Change ontology and ontology label as well

Don’t know anything about inDrops but please check about the end bias. Other inDrops projects have been ingested with “3 prime tag” instead of “3 prime end bias”.

ami-day commented 4 years ago

Hey @ESapenaVentura, I made the above changes, added ontologies using fill_ontologies.py and re-uploaded the file using the new project short name as the file name.

Could we put this through validation again to ensure I didn't break anything?

ESapenaVentura commented 4 years ago

Where is the spreadsheet? I have looked everywhere but I am not sure which one is the most updated one

ami-day commented 4 years ago

@ESapenaVentura Here it is, I had changed the file name to the project short name: https://drive.google.com/drive/folders/1sA4mDAzvAkCAv8e8LYZPW7qkpT_4pRo8

ami-day commented 4 years ago

This is ready to validate and ingest so I am closing the issue now.