ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues #716

Closed ipediez closed 1 year ago

ipediez commented 2 years ago

Project short name:

CovidCellTypes

Primary Wrangler:

Ami

Secondary Wrangler:

Irene

Associated files

Google Drive https://docs.google.com/spreadsheets/d/1WiPlkBzBZBrCu4SbOt2uGCe0vIxE-bGHvB5WKrI_Rdo/edit#gid=817397388

Google Sheet https://docs.google.com/spreadsheets/d/1WiPlkBzBZBrCu4SbOt2uGCe0vIxE-bGHvB5WKrI_Rdo/edit#gid=1646211853

NOTE: in the paper supplementary files, they note the analysis of already published datasets from a previous study. We didn't have that study in ingest, so I have added it as a separate project using the paper doi: Ingest: https://contribute.data.humancellatlas.org/projects/detail?uuid=326b36bd-0975-475f-983b-56ddb8f73a4d&tab=project Paper: https://www.nature.com/articles/s41586-018-0449-8#Sec2

Key Events

ipediez commented 2 years ago

Taking this dataset for secondary review

gabsie commented 2 years ago

@ipediez will review today

ipediez commented 2 years ago

Project

Donor

Specimen from organism

ami-day commented 2 years ago

Thanks @ipediez really good points! The last one was a mistake, i would usually model it as you said re: the cell line and specimen ids in the cell suspension tab.

ami-day commented 2 years ago

I checked the cell count estimates by comparing to the GEO matrix cell counts. The values pretty much correspond, with some discrepancy as expected because of quality filtering of the data post cell sorting and library preparation.

The cell counts you were unable to find makes sense because they are bulk samples that were sequenced, hence no single-cell estimates.

Here is a table just fyi of the cell counts. I have entered 33,844 as the total cell count in the project tab.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

GEO matrix cell counts |   -- | --   |   file name | GEO matrix cell count GSM4487943_Mouse_Nasal_IFNa_NS12_dge.txt.gz | 1603 GSM4487938_Mouse_Nasal_IFNa_NH11_dge.txt.gz | 1799 GSM4487941_Mouse_Nasal_IFNa_NH22_dge.txt.gz | 1467 GSM4487940_Mouse_Nasal_IFNa_NH21_dge.txt.gz | 1321 GSM4487939_Mouse_Nasal_IFNa_NH12_dge.txt.gz | 1744 GSM4487942_Mouse_Nasal_IFNa_NS11_dge.txt.gz | 1696 GSM4487944_Mouse_Nasal_IFNa_NS21_dge.txt.gz | 1453 GSM4487945_Mouse_Nasal_IFNa_NS22_dge.txt.gz | 1590 GSE148829_BEAS_Basal_Pops_TPM.txt.gz (human) | bulk GSE148829_Human1_Basal_Pops_TPM.txt.gz | bulk GSE148829_Human2_Basal_Pops_TPM.txt.gz | bulk Human Nasal ISSS | 8313 GSE148829_Human_Ileum_absorptiveAndCryptentero_dge.csv.gz | 11856 GSE148829_Human_lung_epithelial_cell_raw_counts.txt.gz | 1002 Mouse basal stim | bulk   |   Summary: publication versus GEO cell counts |     |   from supplementary table & publication (Irene's estimate) | from GEO matrix cell count Human Adult Inferior Turnibate Scraping (SeqWell1 and 3): 10.111 cells | 8313 Mouse nasal mucosa (seqWell 3): 11.738 cells | 12673 Human ileal small intestine (10x v2 3'): 22.220 cells | 11856 Human lung (seqWell 3): 1.637 cells | 1002   |     | SUM   | 33844

ami-day commented 2 years ago

Going forward, I have decided that when the matrices are available, I will always use the matrix cell count (now I have a script to calculate this quickly direct from a gzip file). This is because I incorrectly way overestimated a project count recently, due to misreading individual sample counts in the paper. When various technologies arenused, it's easy to do. I find the matrix column count for me at least will be less error prone!

ami-day commented 2 years ago

Emailed the authors about cell types annotation.

ami-day commented 2 years ago

PRJNA627454 files are downloading in tmux session, data scripts folder on EC2.

ami-day commented 2 years ago

ed164273-b241-4cd7-9263-eae87cbd33b1

ami-day commented 2 years ago

synced the data

ami-day commented 2 years ago

graph validating

ami-day commented 2 years ago

Submitted to HCA DCP.

ami-day commented 2 years ago

exported to hca dcp.

MightyAx commented 2 years ago

R16 ReExport Successful

ESapenaVentura commented 2 years ago
ami-day commented 2 years ago

Ok done, the graph is re-validating.

ami-day commented 2 years ago

Re-submitted.

ami-day commented 2 years ago

Exporting.

ami-day commented 2 years ago

It seems to have got stuck in exporting: https://contribute.data.humancellatlas.org/projects/detail?uuid=0b299140-25b5-4861-a69f-7651ff3f46cf&tab=upload

Wkt8 commented 2 years ago

Check if the dataset has been fully exported, if it has, then to set it back to valid to allow for further update. needs looking into and creating a bug ticket for why the state tracker is failing to properly capture states of datasets.

ESapenaVentura commented 2 years ago

SOP here

Wkt8 commented 2 years ago

There's not a further update that needs to be done for this dataset - if it is fully exported, set the state to exported, and then this can be moved to done.

ami-day commented 2 years ago

Already created the ticket, it is in the Operations board

ESapenaVentura commented 2 years ago

One export job was aborted due to a connection error:

2022-07-25 12:39:15 
{"log":"2022-07-25 11:39:15,689 - exporter.terra.terra_listener - ERROR - submission_uuid:302c1f08-dc2d-4653-8a43-e8ff5ac5d9f0 - export_job_id:62de808a5cc3b03957ee841d - ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))\n","stream":"stderr","time":"2022-07-25T11:39:15.693895467Z"}
2022-07-25 12:39:15 
{"log":"2022-07-25 11:39:15,686 - exporter.terra.terra_listener - ERROR - submission_uuid:302c1f08-dc2d-4653-8a43-e8ff5ac5d9f0 - export_job_id:62de808a5cc3b03957ee841d - Rejecting export experiment: ExperimentMessage(process_id='626122b94770e410d1f2089a', process_uuid='9f502e93-ba47-44dc-ba90-224d753677e3', submission_uuid='302c1f08-dc2d-4653-8a43-e8ff5ac5d9f0', experiment_index=73, total=82, job_id='62de808a5cc3b03957ee841d') due to error: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))\n","stream":"stderr","time":"2022-07-25T11:39:15.686507103Z"}
MightyAx commented 2 years ago

the project link above no longer exists

ami-day commented 2 years ago

It is still stuck in exporting: https://contribute.data.humancellatlas.org/projects/detail?uuid=0b299140-25b5-4861-a69f-7651ff3f46cf&tab=upload

MightyAx commented 2 years ago

This was really old so was a bit harder to find but we just needed to retry this message

Rejecting export experiment: ExperimentMessage(process_id='626122b94770e410d1f2089a', process_uuid='9f502e93-ba47-44dc-ba90-224d753677e3', submission_uuid='302c1f08-dc2d-4653-8a43-e8ff5ac5d9f0', experiment_index=73, total=82, job_id='62de808a5cc3b03957ee841d')

I published this message again using the rabbit admin interface, the export succeeded and the project has been set to exported (even after a month of waiting)

{"exportJobId":"62de808a5cc3b03957ee841d","documentId":"626122b94770e410d1f2089a","documentUuid":"9f502e93-ba47-44dc-ba90-224d753677e3","callbackLink":"/processes/626122b94770e410d1f2089a","documentType":"process","envelopeId":"626122334770e410d1f2034c","envelopeUuid":"302c1f08-dc2d-4653-8a43-e8ff5ac5d9f0","projectId":"626122464770e410d1f2034e","projectUuid":"0b299140-25b5-4861-a69f-7651ff3f46cf","index":73,"total":82,"context":null}
ami-day commented 2 years ago

Ok submitted import form for update.

ami-day commented 1 year ago

looks ok