ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

EGAS00001004388 - DevelopmentalOriginsNeuroblastoma #830

Closed ofanobilbao closed 1 year ago

ofanobilbao commented 2 years ago

Project short name:

DevelopmentalOriginsNeuroblastoma

Primary Wrangler:

Ami

Secondary Wrangler:

Ida

Associated files

https://drive.google.com/drive/folders/1Yu5AoZz0GdLfX0RBrqe2ObeKtxW6gPAC

With primary data MAIN VERSION TO BE SUBMITTED https://docs.google.com/spreadsheets/d/1B2gP1FmAXd7smUDULK6Iq4o-I3jrtp0CkoYclTGHoyw/edit#gid=1720350199

Without primary data https://docs.google.com/spreadsheets/d/1wJX75dTushLSDebCyQ5kwRuch_Ez3880483_qMEly4E/edit#gid=1227909870

Link to Ingest

Published study links

Notes:

The data availability is confusing. It says the counts file are for: "31 single-cell transcriptomes of neuroblastomas and normal human developing adrenal glands at various stages of embryonic and fetal development". Assuming this means (1) neuroblastoma samples from the donors listed in Supplementary 8 (<18 months or >18 months old) and (2) embryonic and fetal adrenal samples. There is no metadata available at all for embryonic and fetal donors or samples. Hence, I will not include these donors in the curated spreadsheet. It is also unclear which of the processed counts files correspond to the neuroblastoma patient ids - I have contacted the authors to ask about this.

Key Events

ami-day commented 1 year ago

Moving this to stalled. There is too much missing metadata to be able to curate this to the HCA standard. I have contacted the authors, if they get back I will move forward with the dataset. If not, it should be marked as ineligible.

ami-day commented 1 year ago

Response from the authors:

"In our study, we have two data sets. The neuroblastoma data set is derived from neuroblastoma patients aged >0 months and does not contain any fetal or embryonic donors. The adrenal gland data set originates from healthy embryonic and fetal donors."

Because the publication does not provide any metadata or information about the healthy human embryonic and fetal donors (healthy adrenal gland data), I have excluded this from the metadata spreadsheet. This is because we cannot ask for additional metadata about living european donors (privacy restrictions). I have however downloaded the gene expression counts for the neuroblastoma patients, and the linked patient ids.

Also: the authors say FACS was used to sort single cells (i.e. not doublets, multiplets). No antibodies against cell surface markers or live staining dyes were used.

ami-day commented 1 year ago
000b810a-f3d5-48c1-9e63-63154df9ccfb
ab64067a-45e0-42b5-92e4-7834d995aeb1
a36ec3a4-ea00-4143-bf6c-2a62ee222ce7
df46964c-7654-4d00-9ad8-70376a950708
ami-day commented 1 year ago

Creating a copy - primary data will be removed, since we cannot ask for additional metadata OR data files from a contributor if it is not already public (living European donors).

ami-day commented 1 year ago

Graph validating

Wkt8 commented 1 year ago

Discuss later on slack or set up a meeting with wranglers to discuss 'organ' term

ami-day commented 1 year ago

Graph valid

idazucchi commented 1 year ago

Hi Ami, I'm donw with the secondary review! I think that the adrenal gland data is eligible. The donor metdata is in supplementary table 1 and the cell barcodes are the same as supplementary table 3 so the cell count matrix can be analysed without problems

Donor

Specimen

Cell line

Cell suspension

Enrichment protocol

Sequencing

Sequence file

Analysis file

ami-day commented 1 year ago

Thanks @idazucchi I have made most of the changes, will submit it today/tomorrow.

ami-day commented 1 year ago

Getting an error importing the updated submission in ingest prod.

MightyAx commented 1 year ago

Some notes from testing this on staging: You may want to either add the following files to the project metadata or delete them from the upload area: 000b810a-f3d5-48c1-9e63-63154df9ccfb

Otherwise they will be added to the project as draft files that you have to delete, and possibly exported to the terra staging area.

ami-day commented 1 year ago

Thanks @MightyAx for looking into this. I think in the updated submission the issue you mention above will be resolved. I will check once it has been successfully imported.

MightyAx commented 1 year ago

if you make a new project (and delete the existing one) you will be able to import the spreadsheet. I tested this on staging yesterday with your project.

MightyAx commented 1 year ago

Production has been updated, @ami-day you should now be able to import sucessfully.

ami-day commented 1 year ago

Reimport was successful.

ami-day commented 1 year ago

exported and submitted import form.

ESapenaVentura commented 1 year ago

@MightyAx to investigate if this has actually been exported

ESapenaVentura commented 1 year ago

Has not been exported

gsutil ls gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c8e6c5d9-fcde-4845-bead-ff96999e3051/
CommandException: One or more URLs matched no objects.
MightyAx commented 1 year ago

waiting on issue: ebi-ait/hca-ebi-wrangler-central#926

MightyAx commented 1 year ago

Issue resolved, Exported.

ami-day commented 1 year ago

Looks good.