ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

GSE130973 - Aging Human skin #107

Closed mshadbolt closed 1 year ago

mshadbolt commented 4 years ago

Project label scAgingHumanMaleSkin

Primary Wrangler: Marion & Ami

Secondary Wrangler: Ray

Associated files:

Google Drive: https://docs.google.com/spreadsheets/d/1qhGETqKS5PPg-AVluaUiac6kaivNsLsk/edit#gid=758716052

Published study links

Paper: https://www.nature.com/articles/s42003-020-0922-4 https://www.biorxiv.org/content/10.1101/633131v1.full

Accessioned data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130973

Ingest

Key Events

mshadbolt commented 4 years ago

I was able to almost fully wrangle this dataset, the issues are:

I am currently transferring files from ENA into an hca-util upload area The spreadsheet passed ingest and graph validation @ami-day are you able to do a secondary review during the upcoming sprint?

mshadbolt commented 4 years ago

@rays22 would you also be able to review this one? it is pretty small and simple

rays22 commented 4 years ago

Secondary review

@mshadbolt , Please, let me know if you would like me to fill in the missing ontology terms.

mshadbolt commented 4 years ago

Thanks @rays22 I forgot to run it through the ontology filler script

Would you agree that the inguinal region they took their skin samples from would be best described by the ontology term skin of pelvis? https://ontology.staging.archive.data.humancellatlas.org/ontologies/hcao/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0001415

rays22 commented 4 years ago

For 'the sun-protected inguinoiliac region' in the paper the term inguinal part of abdomen (UBERON:0008337) looks correct to me. I think the pelvis region would be more lateral relative to the inguinal region, but not very far off. I guess you would like the most specific term to be part of zone of skin, but the term skin of the inguinal part of abdomen does not exist in the ontology yet. In that case, the more generic abdominal segment skin UBERON:0003836 might be another option?

I have run the spreadsheet through the ontology filler, and it added these terms/labels:

I see that you have already run the ontology filler too, so I am not going to upload my version of the spreadsheet.

mshadbolt commented 4 years ago

This is ready for export when we are able to export again. Should also be suitable for SCEA

aaclan-ebi commented 3 years ago

Hi, just want to note that this submission was affected by the production data files deletion incident which can be tracked in this ticket, the data files for this dataset needs to be reuploaded before we can submit it.

ami-day commented 3 years ago

I am moving this ticket to done as it is tagged with gdpr. We can't curate to SCEA without paths to fastq or bam files or sra objects.

ami-day commented 3 years ago

Oh wait, looks like the data files might be missing in ingest prod.?

ami-day commented 3 years ago

-Asked Algeria about deleted files. Are they still missing? -Asked Tony about potential data privacy issues - can we upload the fastq to dcp

aaclan-ebi commented 3 years ago

@ami-day unfortunately, the files here haven't been restored yet. :( Let me know if i can help with reuploading the files. I am not sure if the files are in the hca-util upload area. You may have to download/ask the contributor again for the files. Apologies for the inconvenience.

I verified that the submission upload area is empty:

aws s3 ls s3://org-hca-data-archive-upload-prod/482fe66b-3bfe-423b-96dd-bf14144bc18c/
clairerye commented 3 years ago

Can we not submit it with matrices but no fastq files if it is subject to GDPR?

ami-day commented 3 years ago

Can we not submit it with matrices but no fastq files if it is subject to GDPR?

Yep, at the time we couldn't submit matrices only, but now we can, so I'll have a go at this.

ami-day commented 3 years ago

Updated the sheet to submit matrix files instead of sequence files and have submitted the project: https://contribute.data.humancellatlas.org/submissions/detail?id=60cb1648e259f076612626a3 it is currently exporting.

ami-day commented 3 years ago

exported and submitted import form.

idazucchi commented 2 years ago

I'm reopening the ticket because this is a Skin atlas dataset and I've noticed that fastq files are available but not included in the DCP project.

Can we add them with an update?

idazucchi commented 2 years ago

It should be techincally possible to add the fastq files with an update but this task is low priority for two reasons:

  1. the skin network is not particualrly interested in fastq files
  2. we are giving priority to the first 6 draft atlases - our help is most needed for gut, brain and eyes
ami-day commented 2 years ago

Requires checking if fastq files can be made available for living donors

ofanobilbao commented 2 years ago

Re-opening for investigation as per the Ops meeting last week

ofanobilbao commented 1 year ago

@idazucchi can you check if the FASTQ can be shared considering it has the GDPR label? If not, we should close this ticket. If yes, let's move it to Needs Update and proceed

ofanobilbao commented 1 year ago

@idazucchi have you managed to investigate if it's possible to add the FASTQ considering it has the GDPR label? So that we can either close the ticket or move it to the Needs Update column?

idazucchi commented 1 year ago

Hi @ofanobilbao :) We can add fastq files to the project

  1. When living donor datasets already have sequence data publicly available we are allowed to publish the same data in the DCP
  2. a previous comment from Ami says "-Asked Tony about potential data privacy issues - can we upload the fastq to dcp"

I'll work on this before starting a new project but it will be blocked at the file validation stage like the rest of the datasets

idazucchi commented 1 year ago

Files added to the browser!