ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Wrangling of new dataset: GSE121267 (pre-converted by geo-to-hca.py) #103

Closed MightyAx closed 3 years ago

MightyAx commented 4 years ago

Dataset/group this task is for: GSE121267

Wrangler responsible for this dataset/lab: Ami Google sheet: https://docs.google.com/spreadsheets/d/13ABCY3Nzesr0BlCBVBvTtetth9N3Jr9O/edit#gid=877068780

Paper Molecular and functional heterogeneity of IL-10-producing CD4+ T cells

Description of the task:

mshadbolt commented 4 years ago

This project has 2 cell suspensions of immune cells so is not very high priority for adding value to the HCA. I have put into the icebox

ami-day commented 3 years ago

Stalled: need to ask the authors some questions about the dataset

ami-day commented 3 years ago

The authors do not have information about the donors incl. whether they are living EU donors due to anonymous data/privacy rules. They got back with the 10X version. Will submit the metadata and expression matrices (not the raw fastq).

Wkt8 commented 3 years ago

Assigning self as secondary reviewer!

Wkt8 commented 3 years ago

This paper has both mouse and human 10x data. It is oddly sparse on details relating to the human 10x protocols. Please see the note on Analysis_File and the File_Source tab!

Mouse Data: There is mouse 10x data also included in this dataset. Is there a reason we aren't including it together with the human data?

Dissociation Protocol: I might add: After being cleaned with PBS, the tissue was digested in 5 ml full media containing collagenase IV (100 U, Sigma-Aldrich) for 20 min at 37 °C.

Information on Library Preparation and Sequencing Protocol didn't come from the publication (I assume from contributor communication?)

Analysis Protocol: I would add 'Unknown' as the data normalization method as they do not specify it.

Analysis File: (Important!) Pipelines use the file_source field for some indexing. I would add 'GEO' as the File_Source

Additionally, the matrix.tar file only includes Donor 1 and Donor 2 cells, so I would include the matrix_cell_count as 4438 cells.

After quality control, cell cycle filtering and normalization, we analyzed 4438 IL-10-producing T cells pooled from donor 1 and donor 2.

Apart from that it looks great!

ami-day commented 3 years ago

Great, thanks @Wkt8 . As discussed earlier, I won't include the mouse data for various reasons. About the dissociation protocol: the sequencing data is from the PBMCs only; I think the intestinal tissue which gets dissociated was used for other experiment types. Please let me know if you disagree!! It is a bit unclear. I just remember that "buffy coat" is obtained directly from blood in a tube after it is centrifuged (have had to do this in a lab, yuck!). I made the other changes you suggested. And yes, I got the library prep. and sequencing method info. from the authors directly and from GEO :)

Wkt8 commented 3 years ago

@ami-day does this not need to be wrangled to SCEA?

Wkt8 commented 3 years ago

This also needs an update to file_source in order to be indexed and displayed as having matrix files in the Data Portal (other components use file_source as the field to do this).

ami-day commented 3 years ago

@ami-day does this not need to be wrangled to SCEA?

I think this is unsuitable for SCEA because there are just 2 samples that are sequenced and they are both IL-10+ CD4+ T-cells from healthy PBMCs. In their guidelines they say they expect at least 3 replicate samples although some exceptions can be made for e.g. rare tissue samples or other novel sample types. I think because the cells are from healthy PBMCs and blood is a commonly used biomaterial for scRNA-Seq it does not suit their criteria.

ami-day commented 3 years ago

This also needs an update to file_source in order to be indexed and displayed as having matrix files in the Data Portal (other components use file_source as the field to do this).

I will try this now! I had added the file source, but when I add it using an allowed value, the project metadata would not validate, which I believe is a problem with validation in ingest. So I removed it in order to validate the project.

ami-day commented 3 years ago

@Wkt8 I have made the update and it says it is valid now, so will re-submit the metadata only.