ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Benchmarking single-cell RNA-sequencing protocols for cell atlas projects #718

Closed ipediez closed 1 year ago

ipediez commented 2 years ago

Project short name:

BenchmarkingSingleCellProtocols

Primary Wrangler:

Ami

Secondary Wrangler:

Ida

Associated files

Published study links

Key Events

ami-day commented 2 years ago

Requires a new library preparation method ontology term. Request is here: https://github.com/HumanCellAtlas/ontology/issues/107

Requires library preparation method info. from the authors (various methods). I have emailed the authors to ask for this.

ofanobilbao commented 2 years ago

Moving this back to Wrangling while Ami is away, as I am not sure if this is ready for secondary review or not, as I have not seen any posts asking for review and not all the boxes in the acceptance criteria are ticketed.

idazucchi commented 2 years ago

@gabsie and @ami-day to dicuss if this will be included in the DCP

Wkt8 commented 2 years ago

@gabsie what is the status of this?

ofanobilbao commented 2 years ago

@gabsie to investigate

gabsie commented 2 years ago

OK, on this project - let's just make a decision, this has been sitting here forever. The pipelines are no longer an issue. Let's publish the project with both raw and matrices. Any last issues from the rest of the wranglers? @ESapenaVentura @idazucchi @Wkt8

gabsie commented 2 years ago

OK - let's do it - both raw and matrices go into and we publish it. @ami-day - please proceed

idazucchi commented 2 years ago

Postponed to R21 @ami-day

ami-day commented 2 years ago

Requested data via NCBI Cloud Delivery

ami-day commented 1 year ago

064f96e3-2af9-43b0-967e-b86cdd876e79 63a86149-cfa5-417f-8e2f-90229a5a1fca 1af5c8e3-9c8f-4722-afee-f084f6c59f86 6c0f012d-7592-4345-827e-f9ed0d706013

gabsie commented 1 year ago

The majority of files has been downloaded. Ida will finish secondary review. This will probably not make it to release 21, but good to finalise it. @ami-day

ami-day commented 1 year ago

Synced data files. Graph validating

ami-day commented 1 year ago

Waiting on validation of metadata schema test data (cell suspension -> cell suspension as experiment design)

idazucchi commented 1 year ago

Hi ami, I'm done with the review. This is a large dataset with a lot of libraries and cell suspensions. I think we can simplify the way we model the experimental design and still be accurate by :

Project

Donor

Collection

Specimens

Cell line

Enrichment

Cell suspension

Library prep

Sequencing protocol

Sequence files

Analysis protocol

Analysis file

ami-day commented 1 year ago

Updated dataset based on review comments. Need to convert SRR9621775 to fastq. Waiting on test data test.

ami-day commented 1 year ago

Waiting on this PR: https://github.com/HumanCellAtlas/schema-test-data/pull/28

Wkt8 commented 1 year ago

Unblocked by cell-to-cell-suspension test data.

idazucchi commented 1 year ago

Lower priority than kidney datasets - will tackle this in release 26

gabsie commented 1 year ago

needs a look from another wrangler on the modelling - maybe @anu-shiva and @Wkt8

idazucchi commented 1 year ago

two additional issues:

idazucchi commented 1 year ago

edit smartseq2_has_2-3_files

COUNT(f.`read_index`) as num_files, COUNT(DISTINCT f.`read_index`) as read_set
AND NOT num_files = read_set
RETURN p, "Smart-Seq2 protocols should contain between 2 and 3 files", labels(p)

? f is all files? I need just the ones linked to the process

gabsie commented 1 year ago

blocked by the graph validation issue

gabsie commented 1 year ago

Import form is filled