Closed ami-day closed 2 years ago
Emma has sent the ArrayExpress login details for the scRNA-seq and scVDJ-seq datasets. The 10X visium log-in will be sent soon.
Moving this to stalled and from release 14 because we need additional info from the authors, and they have not yet made their data public yet (I have a private ArrayExpress log-in).
Emma emailed to say they are updating the data and adding new metadata. I said we can aim to get it submitted for release 17 (May release).
@ami-day why is this in stalled? Does not have any labels to explain. If so, are we still aiming at Release 17?
They aren't ready to submit it yet. They are generating more data.
Hey @ami-day , @ofanobilbao - can we check whether these people are ready now? We just saw this advertised in the HCA opening slides. Then we can maybe prioritise for this release.
@gabsie I've moved it to Wrangling for @ami-day to prioritise
I have an email exchange with Emma (author), we decided it would be best to wait until they have the ArrayExpress accessions and those datasets published, so that I can use that metadata as a template. When we spoke they had some data ready but not the full project. When they get back to me as confirmed, I will move forward with this ticket.
Thanks, @ami-day - it might be nice to remind them. :) Say we have seen the slide at the HCA meeting, in case they have forgotten about us.
The new data is now ready to curate (Emma emailed about it).
@idazucchi to secondary review
@idazucchi is almost ready with the secondary review
Hi Ami, I'm done with the secondary review! Let me know if you want to discuss anything
The end of the eighth week marks the end of the "embryonic period" and the beginning of the "fetal period."
FCA_gut8090115
, Human_colon_16S8157851
Human_colon_16S8157867
should be mesenteric lymph node
not colonFCA_gut8090111
, FCA_gut8090112
, FCA_gut8090113,
FCA_gut8090114,
FCA_gut8090115,
FCA_gut8090116,
FCA_gut8090117,
FCA_gut8090118and
FCA_gut809011` should be gut collection_protocol_fetal_tissue
since it’s used for visium specimens as well, which are not dissociatedncbi_taxon_id
should be human + mouse for all organoidsenrichment_protocol_facs_ATOs_weekX
can be removed from the organoid tab and applied at the cell suspension levelncbi_taxon_id
should be human + mouse enrichment_protocol_cell_size_ATOs
enrichment protocol6180STDY9448808_cells
should have enrichment protocol enrichment_protocol_facs_ATOs_week3
and a CD45+ enrichment protocolFCAImmP7851896_cells
should have an enrichment protocol CD137+ not CD45+FCAImmP7803020
and FCAImmP7803021
have the same enrichment and the same donor + tissue. FCA_gut8090111
FCA_gut8090112
and FCA_gut8090113
FCAImmP7528290
from scRNA and FCAImmP7607593
from vdj, they come from the same donor, sample and have the same enrichment stepFCAImmP7851889_cells
should have enrichment_protocol_MAIT
enrichment protocolmicroscope setup description
umi offset
should be 16 for both library_preparation_TCR
library_preparation_Ig
umi barcode length
should be 12sequencing_protocol_visium
should use tag based single cell RNA sequencing
PAN.A01.v01.raw_count.20210429.PFI.embedding.h5ad
and PAN.A01.v01.entire_data_normalised_log.20210429.full_obs.annotated.clean.csv
are linked to cell suspensions for organoids, Visium and VDJ, but the files is only for scRNAseqHi Ami, I'm done with the secondary review! Let me know if you want to discuss anything
Thank you @idazucchi, I have made most of your suggested updates. However, some things I decided to keep the same, so I am adding my comments on those below.
General
You mentioned a few times disagreeing with the modelling of the specimens and cell suspensions. I believe in this particular study, there are unique samples derived from the same organ type and donor. If they are processed in the same way, they are still unique samples. This especially makes sense in light of 10X Visium samples taken at different spatial locations but can apply to scRNA-Seq too. I would prefer to keep all the sample IDs linked to biosample accessions as they were initially curated and linked in the SCEA MAGE-TAB files.
Donor
would it be possible to obtain the HDBR accessions for the donors?
I had a look and I don't see these in the donor supplementary material, or the HDBR website. Have you been able to find the HDBR accessions for a dataset in the past?
Specimen
the tissue for specimens
FCA_gut8090111
,FCA_gut8090112
, FCA_gut8090113,
FCA_gut8090114,
FCA_gut8090115,
FCA_gut8090116,
FCA_gut8090117,
FCA_gut8090118and
FCA_gut809011` should be gut
Selecting the appropriate organ ontology is difficult here, as I understand, "gut" is not an organ, but a system of organs. However the metadata has been annotated at the level of the gut, so in this case I think you're right to select gut, I updated with "gastrointestinal system".
Cell line
please add the induction protocol for the iPSC
In this case, I think it is not necessary (I remember having a conversation about this with someone previously). The hIPSC project cell lines are well known and well established, the associated publication is referenced, and the authors obtained the cell lines in the iPSC state (as opposed to running the iPSC induction protocol themselves).
you could fill out the cell type / tissue type
I think it doesn't make sense to do this for iPSC cell lines. They do not reflect a differentiated tissue or cell type other than iPSC.
Sequencing protocol
sequencing_protocol_visium
should usetag based single cell RNA sequencing
I thought 10X visium is typically applied at the bulk level to a small set of cells in a specific spatial location. Which bit of information did you find that suggested the 10X visium was at the level of single cell?
Analysis file
- is there a particular reason you didn’t include the h5ad files available here?
I'm not sure what you mean here, PAN.A01.v01.raw_count.20210429.PFI.embedding.h5ad
and 'PAN.A01.v01.raw_count.20210429.PFI.embedding.h5ad' files were downloaded from https://developmental.cellatlas.io/fetal-immune. Do you mean, why didn't I download the spatial and VDJ h5ad files? I'm not sure why I didn't do this, I will add them now.
ecf1dc81-0ff3-4f81-b927-99decb910c5a
Graph validating.
Syncing 2 missing files then will re-graph validate
Submitted.
This export failing, investigation notes follow:
The datafile sync was completed (subject to manual verification) but the exporter job that was waiting for the completion was timedout and never recorded in ingest that the data file process was finished.
All other pods have been waiting to for ingest to be updated with the state of the data file synchronisation before starting any metadata synchronisation.
Starting the pods now they would still wait because ingest will never be updated with the success of the data file synchronisation job.
Expected files: 603
➜ gsutil ls -r "gs://broad-dsp-monster-hca-prod-ebi-storage/prod/fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a/data/" | sed -e 's/.*\.//' | sort | uniq -c
1 csv
579 gz
9 h5ad
3 sh
15 tiff
1 txt
csv + gz + h5ad + tiff = 604
there's an extra h5ad, some sh files and a txt file in the payload of the data file sync that isn't in the project. I think either these need to be added to the project or they extra to the project and we can continue with exporting. Files that are in the data export but not in the project (which is probably fine)
# Probably a duplicate of Visium10X_data_LI.h5ad
gs://broad-dsp-monster-hca-prod-ebi-storage/prod/fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a/data/Visium10X_data_LI (1).h5ad
gs://broad-dsp-monster-hca-prod-ebi-storage/prod/fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a/data/download.sh
gs://broad-dsp-monster-hca-prod-ebi-storage/prod/fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a/data/download2.sh
gs://broad-dsp-monster-hca-prod-ebi-storage/prod/fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a/data/download3.sh
gs://broad-dsp-monster-hca-prod-ebi-storage/prod/fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a/data/tmp.txt
PATCH /exportJobs/62e268915cc3b03957ee94f3 HTTP/1.1
Authorization: Bearer <snipped_for_security>
Content-Type: application/json
User-Agent: PostmanRuntime/7.29.2
Accept: */*
Cache-Control: no-cache
Postman-Token: bc4f6dcf-e408-497d-8430-1e0abf16221a
Host: api.ingest.archive.data.humancellatlas.org
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 97
{
"context": {
"totalAssayCount": 217,
"isDataTransferComplete": true
}
}
I've put the 190 export jobs back on the ingest.terra.experiments.new
queue, which is now busy with other exports, the queue is 450 experiments long)
Theoretically no manual intervention will be required when all the messages are processed the submission should be updated with the export process as you would expect.
This one was just stuck behind a project was is "Actually" stuck, freeing up the export queue and moving just this projects exports across has made this export successfully.
Submitted import form.
project short name
DevelopingImmuneSystem
Primary wrangler
Ami
Secondary wrangler
Ida
Ingest
https://contribute.data.humancellatlas.org/projects/detail?uuid=fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a&tab=project
submission https://contribute.data.humancellatlas.org/submissions/detail?uuid=8c987aea-34b5-4f17-ba4d-cc6726730638&project=fcaa53cd-ba57-4bfe-af9c-eaa958f95c1a
Publication:
preprint Mapping the developing human immune system across organs
Data
Google Sheet:
Latest (De-duped samples): https://docs.google.com/spreadsheets/d/1lSEiH_ZS-H8xtTOxJ9NTq8WhXIiqI8pKvFUcZhakq84/edit
1st version https://docs.google.com/spreadsheets/d/1GJbl-UOWNXvkcV7Qsx0em7x-DBaqR8pxPJ9p6v0oOi4/edit#gid=1259194338