ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Adult_Nonmobilised_PB Laurenti submission: track progress #47

Closed ami-day closed 3 years ago

ami-day commented 4 years ago

Primary Wrangler: Ami

Secondary Wrangler: Enrique

Associated files: https://docs.google.com/spreadsheets/d/1j1UZ7nc6U-BgVz0xZSHipkxHE9t-vbdXUVEkr2SVQ3A/edit#gid=900688680

Key Events

Please track the below as well as the key events:

  1. Track date first spreadsheet received and final spreadsheet sent by editing ticket to include date next to event.
  2. Track spreadsheet iterations by placing asterisks next to receive spreadsheet event.
  3. Track any metadata issues/tickets made for dataset with a bulleted list of links under received spreadsheet event. Links should be to the ticket in the metadata repo.
ami-day commented 4 years ago

@ESapenaVentura this spreadsheet is ready for review. I made changes to the original and filled ontologies but have not added it to ingest yet, I will after we have reviewed and updated it. Some comments: please could you in particular check the selected cell types and also dissociation methods. There are multiple values so I added "||" between them, but the fill_ontologies.py script is adding just 1 ontology label and term. Thank you!

ESapenaVentura commented 4 years ago

I have reviewed the spreadsheet and saved the result as <spreadsheet_name>_Enrique_review.

Donor organism

ami-day commented 4 years ago

Thank you @ESapenaVentura!

I had the same thought about this: 'Maybe we should request to add apheresis to the ontology terms under blood draw'. I actually added that term then ran your fill_ontologies script, and there is not an appropriate ontology term for it, so I changed it back to blood draw. It is still kind of a blood draw though so I think it is ok, given they describe the process in the collection protocol description.

The other changes sound good.

I agree with your queries; I'll respond to them today with some questions about these and also send them the HCA questionnaire. I'll reply to Hugo separately about the bam files so I can respond to Elisa/the group.

ami-day commented 4 years ago

Could you please paste the link to the questionnaire here? Is it possible to change the template dataset issue tickets to include the HCA T&Cs link and the questionnaire link?

ESapenaVentura commented 4 years ago

http://tinyurl.com/HCA-Terms-Conditions

Sure, if you feel like it will be better there you can create a ticket and, in a new branch, suggest changes to the templates, and the rest of the wranglers can review the PR :)

Also, there is an error in my comment, sorry: For the dissociation protocol, I deleted it*

ami-day commented 4 years ago

@ESapenaVentura that link is for the terms and conditions form! What is the link to the questionnaire?

ESapenaVentura commented 4 years ago

Completely misread that, sorry!

http://tinyurl.com/HCA-Project-Questionnaire

ESapenaVentura commented 4 years ago

Changelog 12th June 2020:

lauraclarke commented 4 years ago

Does this ticket need to continue to exist or do we need a ticket to track the datasets which need to be exported to DCP2 as wrangling is now done if I understand it correctly?

lauraclarke commented 4 years ago

Actually, I remembered the GDPR question, does this dataset contain living donors? if it is, we can't push it to the DCP

ESapenaVentura commented 4 years ago

@lauraclarke yes, this dataset does contain living donors ~(As well as the previous Laurenti's dataset)~ Not sure about the previous one.

lauraclarke commented 4 years ago

I thought the previous one was deceased donors rather than living ones?

ESapenaVentura commented 4 years ago

This dataset is already wrangled, right?

Can we close it?

ami-day commented 3 years ago

This project is stalled due to it involving samples from living donors.

ami-day commented 3 years ago

Hugo has new donors and samples to add so this project is being updated. @aaclan-ebi is working on some ingest fixes and then will start working on archiving this.

aaclan-ebi commented 3 years ago

Thanks @ami-day for providing me the spreadsheet.

I used the spreadsheet here: https://docs.google.com/spreadsheets/d/1HMfdXl2LSYDDb4Etjgg1jJZpMe3oYFYG/edit#gid=668357357 (i renamed it to 2020118_SUBS10_PB_extra_donors_hca_spreadsheet_edit.xlsx)

Here's the new submission in prod https://contribute.data.humancellatlas.org/submissions/detail?id=6005bd74c9762f5f0de9f6c9&project=fc55bd4a-8694-4a28-9a35-59a685bda323

Project's page: https://contribute.data.humancellatlas.org/projects/detail?uuid=fc55bd4a-8694-4a28-9a35-59a685bda323

If it looks good, would you be able to upload the files for this submission? Thanks!

ami-day commented 3 years ago

We also need to update the project description to include the new cell and sample count. But we need to wait for other changes/fixes first.

ami-day commented 3 years ago

Have uploaded the data files to the submission.

aaclan-ebi commented 3 years ago

We've fixed some linking issues in the dataset. New spreadsheet: https://docs.google.com/spreadsheets/d/1-O9tnP_AdO7RejP33Yy9iKUe-Z27AI9a/edit#gid=1768425982 New submission: https://contribute.data.humancellatlas.org/submissions/detail?id=600715e4c9762f5f0de9fa67&project=fc55bd4a-8694-4a28-9a35-59a685bda323 Awaiting file upload @ami-day . Thanks!

aaclan-ebi commented 3 years ago

Current DSP Submission: https://submission.ebi.ac.uk/api/submissions/74d16604-6a2c-497e-b1ca-1e552c30cfcf

Summary of entities to be archived: 6 samples (2 donors, 2 specimens, 2 cell suspensions) 2 sequencing experiments 16 sequencing runs

See details: https://api.ingest.archive.data.humancellatlas.org/archiveSubmissions/6008584cc9762f5f0de9ff3c/entities?size=24

The file archiver jobs are currently running in the EBI cluster. We'll complete the submission once all jobs are done and submission is valid.

aaclan-ebi commented 3 years ago

Hi @ami-day

We've completed the DSP submission and here are the accessions :

SAMEA8072986
SAMEA8072985
SAMEA8072987
SAMEA8072988
SAMEA8072984
SAMEA8072983
ERX4972373
ERX4972372
ERR5167451
ERR5167449
ERR5167445
ERR5167442
ERR5167447
ERR5167446
ERR5167448
ERR5167450
ERR5167444
ERR5167452
ERR5167441
ERR5167439
ERR5167443
ERR5167440
ERR5167438
ERR5167437

Could see more details on the entities here: https://api.ingest.archive.data.humancellatlas.org/archiveSubmissions/6008584cc9762f5f0de9ff3c/entities?size=24

ami-day commented 3 years ago

This is great @aaclan-ebi :) Can we edit the project description still to reflect the additional cells & samples since the 1st submission was submitted?

aaclan-ebi commented 3 years ago

Yup, i'll look into that today.

aaclan-ebi commented 3 years ago

@ami-day could you paste here the exact update to the project description here:

current value:

Haematopoietic stem and progenitor cells (HSPCs), the precursors of all blood cells, reside predominantly in the bone marrow. Yet, a  small proportion (<1%) of phenotypic HSPCs circulates through peripheral blood (PB) at any given time. To date, the detailed characterization of steady-state circulating HSPCs in adult humans remains very poor. Here, we analyse the single-cell composition of the adult human HSPC pool within non-mobilised PB from four healthy donors. 10x scRNA-seq of 22000 HSPCs from all four donors was paired with single-cell functional analysis using most immature haematopoietic stem cells and multipotent progenitors (HSC/MPPs). We find that long-term functional HSC/MPPs are very rare in non-mobilised PB, and that a large fraction of circulating HSPCs is biased towards the erythroid lineage. In particular, we detect the enrichment of a subset of exclusively erythroid/megakaryocyte-primed quiescent HSC-like cells within the phenotypic PB HSC/MPP compartment.

After a call with Karoly (DSP), we decided to do it manually via BioStudies and ENA UI (pending approval from Melanie) to avoid any issues in updating via DSP. It's just 2 simple updates to project & study so it's not painful to do it manually via BioStudies & ENA UI. DSP is going to be deprecated soon so I'd say the discrepancy between DSP and archives is acceptable. We just need to make sure it's noted. If ever we need to make more updates while using DSP.

ami-day commented 3 years ago

New value:

Haematopoietic stem and progenitor cells (HSPCs), the precursors of all blood cells, reside predominantly in the bone marrow. Yet, a  small proportion (<1%) of phenotypic HSPCs circulates through peripheral blood (PB) at any given time. To date, the detailed characterization of steady-state circulating HSPCs in adult humans remains very poor. Here, we analyse the single-cell composition of the adult human HSPC pool within non-mobilised PB from four healthy donors. 10x scRNA-seq of 51,000 HSPCs from all six donors was paired with single-cell functional analysis using most immature haematopoietic stem cells and multipotent progenitors (HSC/MPPs). We find that long-term functional HSC/MPPs are very rare in non-mobilised PB, and that a large fraction of circulating HSPCs is biased towards the erythroid lineage. In particular, we detect the enrichment of a subset of exclusively erythroid/megakaryocyte-primed quiescent HSC-like cells within the phenotypic PB HSC/MPP compartment.

Thanks @aaclan-ebi !

aaclan-ebi commented 3 years ago

Thanks @ami-day

Karoly(DSP) and I couldn’t use the DSP username+password for the BioStudies UI. It seems like it uses a different email and password. I emailed biostudies@ebi.ac.uk to request for the update: biostudies #475915 ( cc'ed you) Let's just keep an eye on it.

aaclan-ebi commented 3 years ago

Karoly is also looking into updating it in ENA Web in portal but their UI is very slow and he couldn't proceed. He's checking with some ENA folks for help.

aaclan-ebi commented 3 years ago

Filed an issue to ENA regarding the issue on their webin UI. Screenshot 2021-01-26 at 17 56 44

aaclan-ebi commented 3 years ago

@ami-day ENA study is already updated https://www.ebi.ac.uk/ena/browser/view/PRJEB38994

Screenshot 2021-02-01 at 17.47.39.png

aaclan-ebi commented 3 years ago

@ami-day BioStudies project is already updated https://www.ebi.ac.uk/biostudies/studies/S-SUBS10?query=S-SUBS10

Screenshot 2021-02-04 at 10.41.47.png

ofanobilbao commented 3 years ago

@ami-day I have added the SCEA brokering label, assuming this needs to be brokered to SCEA. Please, remove if that's not the case. I have moved it to Finished on the wrangling board as from what I can see it looks like Done from a DCP point of view. Is that correct? Please, let me know or move it as you deem appropriate. Thanks!