ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

GSE130977, GSE130646, GSE138707 - Skeletal Muscle #105

Closed mshadbolt closed 3 years ago

mshadbolt commented 4 years ago

Primary Wrangler: Marion Shadbolt

Secondary Wrangler: Ray

Associated files:

Google Drive: https://drive.google.com/open?id=1x0wh-JqX2AOI9Iq6bpIFwMmsbgPtYhmp

Published study links

Paper: Single-cell transcriptional profiles in human skeletal muscle

Accessioned data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130977 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130646 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138707

Key Events

mshadbolt commented 4 years ago

I converted and combined the three GEO series into one spreadsheet. There are 4 10x human cell suspensions derived from one donor, 2 mouse cell suspensions and 18 SMARTer cell suspensions. I am not sure if these are analysable the same way as SS2.

There is very little information recorded in the metadata about the donors so I have tried approaching the authors.

mshadbolt commented 4 years ago

emailed the authors to see if they can provide any further metadata (Stuart Sealfon and Gregory Smith)

mshadbolt commented 3 years ago

I requested more info from the authors again, I will move to stalled and if no response by the end of the sprint I don't think this dataset is worth pursuing further.

mshadbolt commented 3 years ago

received donor metadata so i will work on incorporating into the spreadsheet then will be ready to secondary review

mshadbolt commented 3 years ago

I have run the graph validator once but after I believe I fixed some issues I could not run it again as the python install on the ec2 seems broken, I will need to check again once it is back up and running again

rays22 commented 3 years ago

Secondary review:

I have reviewed the metadata. I have found only one minor issue:

enrichment_protocol.protocol_core.protocol_description : "cells from the five mouse diaphragms"

ingest graph validation

I have run the ingest graph validator tests on my laptop. I used the latest spreadsheet: GSE130646-GSE130977-GSE138707_combined_ontologies_sra_files.xlsx. I have upload the test results to the gdrive.

I am listing the failed tests, however, all the failed test appear to be false errors.

Summary

I have not found any major issues.

mshadbolt commented 3 years ago

asked the contributor for cell type annotations/cluster information for matrices as only 2d cell/gene matrices available on geo

mshadbolt commented 3 years ago

uploaded matrix files to google bucket and filled in sheet

mshadbolt commented 3 years ago

I have asked SCEA folks about whether they would like this project in SCEA/GXA

mshadbolt commented 3 years ago

SCEA said they were interested in the single cell stuff (GSE138707 & GSE130646). This consists of two cell suspensions from mouse and 4 biopsies taken from a single human donor.

My attempt at trying to convert this part of the experiment is here for IDs E-HCAD-33 and E-HCAD-34: https://drive.google.com/open?id=1u0084DN_u0ApwyF6CLpz_Ld8HOHz8OGq

I have done the initial conversion but I wasn't sure about what information to put for the sra/fastq files. The factor-value stuff kinda confused me too. So there is some manual curation required in order to progress these to submission to SCEA.

ami-day commented 3 years ago

Taking this on

ofanobilbao commented 3 years ago

@ami-day I have moved this to Finished in the Wrangling board as I believe that's the case on the DCP journey. And that the only thing remaining is conversion to SCEA, which will be tracked on the SCEA board. Please, feel free to correct if I did not get that right. Thanks!