Closed ipediez closed 1 year ago
Requires a new library preparation method ontology term. Request is here: https://github.com/HumanCellAtlas/ontology/issues/107
Requires library preparation method info. from the authors (various methods). I have emailed the authors to ask for this.
Moving this back to Wrangling while Ami is away, as I am not sure if this is ready for secondary review or not, as I have not seen any posts asking for review and not all the boxes in the acceptance criteria are ticketed.
@gabsie and @ami-day to dicuss if this will be included in the DCP
@gabsie what is the status of this?
@gabsie to investigate
OK, on this project - let's just make a decision, this has been sitting here forever. The pipelines are no longer an issue. Let's publish the project with both raw and matrices. Any last issues from the rest of the wranglers? @ESapenaVentura @idazucchi @Wkt8
OK - let's do it - both raw and matrices go into and we publish it. @ami-day - please proceed
Postponed to R21 @ami-day
Requested data via NCBI Cloud Delivery
064f96e3-2af9-43b0-967e-b86cdd876e79
63a86149-cfa5-417f-8e2f-90229a5a1fca
1af5c8e3-9c8f-4722-afee-f084f6c59f86
6c0f012d-7592-4345-827e-f9ed0d706013
The majority of files has been downloaded. Ida will finish secondary review. This will probably not make it to release 21, but good to finalise it. @ami-day
Synced data files. Graph validating
Waiting on validation of metadata schema test data (cell suspension -> cell suspension as experiment design)
Hi ami, I'm done with the review. This is a large dataset with a lot of libraries and cell suspensions. I think we can simplify the way we model the experimental design and still be accurate by :
reference_sample
as input for all the libraries and files to highlight that they are all using the same cell mix. human adult stage
can't be used for non human donorsMDCK_donor
should be female, species canis lupus familiarisreference_sample_processing_protocol
could be fluorescence-activated cell sorting for DAPI- cells. You can use it connect together all the cells pooled for reference_sample
and add "During FACS isolation, DAPI-positive cells were excluded to remove dead and damaged cells. " to the descriptiondissociation_protocol_cell_line
for the dissociation protocolreference_sample
can have all the colon, PBMCs and cell line suspensions as input and reference_sample_processing_protocol
for enrichment protocol library_preparation_protocol_C1
according to the supplemetary material this technique is barcoded. The GEO accession suggests R1 = BC (1-6), UMI (7-11) but I have not found anything to confirm that. The end bias should be 3' and the strand might be first.library_preparation_protocol_ICELL8
these are the barcodes length according to geo R1 = BC (1-11), UMI (12-22); R2 = cDNAlibrary_preparation_protocol_MARSseq
these are the barcodes length according to geo R1 = cDNA; R2 = BC (1-6), UMI (7-14)library_preparation_protocol_QuartzSeq2
these are the barcodes length according to geo R1 = BC (1-15), UMI (16-23), R2 = cDNAlibrary_preparation_protocol_gmSCRBseq
these are the barcodes length according to geo R1 = BC (1-6), UMI (7-16); R2 = cDNA library_preparation_protocol_ddSEQ
these are the barcodes length according to geo R1 = BC (8-14,25-31,42-48), UMI (1-7); R2 = cDNAsequencing_protocol_C1
should also be tag-basedsequencing_protocol_CELSeq2
should have HiSeq 3000 according to GEOsequencing_protocol_ICELL8
should have HiSeq 4000 according to GEOsequencing_protocol_gmSCRBseq
should have HiSeq 2000 according to GEOsequencing_protocol_inDrop
should have HiSeq 4000 according to GEOanalysis_protocol_processed
says the data was normalised but the counts in the file don't look like they have been normalised. Is this the protocol applied to the data available from geo? If not it can be deletedanalysis_protocol_raw
Updated dataset based on review comments. Need to convert SRR9621775 to fastq. Waiting on test data test.
Waiting on this PR: https://github.com/HumanCellAtlas/schema-test-data/pull/28
Unblocked by cell-to-cell-suspension test data.
Lower priority than kidney datasets - will tackle this in release 26
needs a look from another wrangler on the modelling - maybe @anu-shiva and @Wkt8
two additional issues:
edit smartseq2_has_2-3_files
COUNT(f.`read_index`) as num_files, COUNT(DISTINCT f.`read_index`) as read_set
AND NOT num_files = read_set
RETURN p, "Smart-Seq2 protocols should contain between 2 and 3 files", labels(p)
? f is all files? I need just the ones linked to the process
blocked by the graph validation issue
Import form is filled
Project short name:
BenchmarkingSingleCellProtocols
Primary Wrangler:
Ami
Secondary Wrangler:
Ida
Associated files
Google Drive: https://drive.google.com/drive/folders/10XD1BCC9ohFHVcNFGGTmUtJoIhoBuvw3
Google Sheet: https://docs.google.com/spreadsheets/d/1mhL-LOvfG67GfEdf1fBf0ch-fbqQ5s5hwqjyb1DEyaY/edit#gid=1503327377
Ingest: https://contribute.data.humancellatlas.org/projects/detail?uuid=6e177195-0ac0-468b-99a2-87de96dc9db4
Published study links
Paper: https://doi.org/10.1038/s41587-020-0469-4
Accessioned data: GSE133549
Key Events