ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

Fastq validation tool #746

Open ami-day opened 2 years ago

ami-day commented 2 years ago

Goal

Development of a stand-alone fastq validation tool. A flexible tool that can be applied to different single-cell technology types to check for the expected cell barcode. It should also run a general fastq quality check applicable to all single-cell fastq files.

First iteration will be focused on identifying whether the expected 10X single-cell version cell barcode is in the reads, and other general fastq quality checks. This will hopefully be extended to other single-cell technology types, where we are able to get a reliable cell barcode pattern/regex.

The tool should be stand-alone, so it can be integrated and used in HCA ingest or by ENA when 10X data is submitted.

Initial steps

Google Slides https://docs.google.com/presentation/d/14Vg0BXMVPKB679_sETIv_xvaZ4zWxHde4Otf_EYVAhU/edit#slide=id.p1

ami-day commented 2 years ago

Found and tests the following tool: https://readthedocs.org/projects/umi-tools/ It works well for different 10X chemistry versions! Using expected cell barcode whitelists downloaded from the latest 10X CellRanger software.

The creators of the tool also provide an inDrop regex which can be used. I am yet to test this on inDrop fastq data.

I have emailed the creators to ask if they have a list of cell barcode regex / patterns for other single-cell technology types too.

· CEL-Seq2 / CEL-Seq · Quart-seq2 / Quart-seq · ddSEQ · MARS-seq · SCRB-seq

gabsie commented 2 years ago

Ami has made some progress here.

ami-day commented 2 years ago

Have found the relevant cell barcode and umi barcode patterns in publications and documentation for the following methods:

· inDrop · Drop-Seq · CEL-Seq2 · Quart-seq2 · MARS-seq · SCRB-seq ·Seq-Well ·Visium