Closed mshadbolt closed 1 year ago
I was able to almost fully wrangle this dataset, the issues are:
I am currently transferring files from ENA into an hca-util upload area The spreadsheet passed ingest and graph validation @ami-day are you able to do a secondary review during the upcoming sprint?
@rays22 would you also be able to review this one? it is pretty small and simple
HCA metadata spreadsheet GSE130973.xlsx
passed validation by the ingest-graph-validator
. However, Ingest submission Aug 4, 2020, 11:08:32 AM e984c0a8-7a8d-4a23-ac2a-596b3ab2b128
failed to load by the ingest-graph-validator
into a database, which could be due to a bug in the tool or I may have been using the wrong API end-point, or something else.
The experimental design in the spreadsheet looks OK to me. I have uploaded the experimental design graph GSE130973_graph2.svg
to the Google drive folder of this dataset. The experimental graph is shown without any protocols to make it clearer.
The data files look valid in Ingest UI.
Some of the possible ontology terms are missing in the metadata spreadsheet.
e.g: LIBRARY CONSTRUCTION METHOD ONTOLOGY ID LIBRARY CONSTRUCTION METHOD ONTOLOGY LABEL ONTOLOGY ID
@mshadbolt , Please, let me know if you would like me to fill in the missing ontology terms.
Thanks @rays22 I forgot to run it through the ontology filler script
Would you agree that the inguinal region they took their skin samples from would be best described by the ontology term skin of pelvis
? https://ontology.staging.archive.data.humancellatlas.org/ontologies/hcao/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0001415
For 'the sun-protected inguinoiliac region' in the paper the term inguinal part of abdomen (UBERON:0008337)
looks correct to me. I think the pelvis region would be more lateral relative to the inguinal region, but not very far off.
I guess you would like the most specific term to be part of zone of skin
, but the term skin of the inguinal part of abdomen
does not exist in the ontology yet.
In that case, the more generic abdominal segment skin UBERON:0003836
might be another option?
I have run the spreadsheet through the ontology filler, and it added these terms/labels:
Library preparation protocol
tab:
Found exact match for term polyA RNA (Ontology: OBI:0000869)
Found exact match for term 10x 3' v2 sequencing (Ontology: EFO:0009899)Sequencing protocol
tab:
Found exact match for term tag based single cell RNA sequencing (Ontology: EFO:0008440)
Found exact match for term tag based single cell RNA sequencing (Ontology: EFO:0008440)I see that you have already run the ontology filler too, so I am not going to upload my version of the spreadsheet.
This is ready for export when we are able to export again. Should also be suitable for SCEA
Hi, just want to note that this submission was affected by the production data files deletion incident which can be tracked in this ticket, the data files for this dataset needs to be reuploaded before we can submit it.
I am moving this ticket to done as it is tagged with gdpr. We can't curate to SCEA without paths to fastq or bam files or sra objects.
Oh wait, looks like the data files might be missing in ingest prod.?
-Asked Algeria about deleted files. Are they still missing? -Asked Tony about potential data privacy issues - can we upload the fastq to dcp
@ami-day unfortunately, the files here haven't been restored yet. :( Let me know if i can help with reuploading the files. I am not sure if the files are in the hca-util upload area. You may have to download/ask the contributor again for the files. Apologies for the inconvenience.
I verified that the submission upload area is empty:
aws s3 ls s3://org-hca-data-archive-upload-prod/482fe66b-3bfe-423b-96dd-bf14144bc18c/
Can we not submit it with matrices but no fastq files if it is subject to GDPR?
Can we not submit it with matrices but no fastq files if it is subject to GDPR?
Yep, at the time we couldn't submit matrices only, but now we can, so I'll have a go at this.
Updated the sheet to submit matrix files instead of sequence files and have submitted the project: https://contribute.data.humancellatlas.org/submissions/detail?id=60cb1648e259f076612626a3 it is currently exporting.
exported and submitted import form.
I'm reopening the ticket because this is a Skin atlas dataset and I've noticed that fastq files are available but not included in the DCP project.
Can we add them with an update?
It should be techincally possible to add the fastq files with an update but this task is low priority for two reasons:
Requires checking if fastq files can be made available for living donors
Re-opening for investigation as per the Ops meeting last week
@idazucchi can you check if the FASTQ can be shared considering it has the GDPR label? If not, we should close this ticket. If yes, let's move it to Needs Update and proceed
@idazucchi have you managed to investigate if it's possible to add the FASTQ considering it has the GDPR label? So that we can either close the ticket or move it to the Needs Update column?
Hi @ofanobilbao :) We can add fastq files to the project
I'll work on this before starting a new project but it will be blocked at the file validation stage like the rest of the datasets
Files added to the browser!
Project label scAgingHumanMaleSkin
Primary Wrangler: Marion & Ami
Secondary Wrangler: Ray
Associated files:
Google Drive: https://docs.google.com/spreadsheets/d/1qhGETqKS5PPg-AVluaUiac6kaivNsLsk/edit#gid=758716052
Published study links
Paper: https://www.nature.com/articles/s42003-020-0922-4 https://www.biorxiv.org/content/10.1101/633131v1.full
Accessioned data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130973
Ingest
Key Events