fls-bioinformatics-core / auto_process_ngs

Scripts and utilities for automatic processing & management of Illumina NGS sequencing data.
Other
9 stars 7 forks source link

Add 'read ranges' to QC protocols for 10x Flex data #834

Closed pjbriggs closed 1 year ago

pjbriggs commented 1 year ago

Add a new feature to the QC pipeline which enables 'read ranges' to be specified with the sequence data reads within QC protocols. These ranges essentially define subsequences to be cut out of the reads in the specified Fastqs for relevant QC metrics (coverage, screens, strandedness).

This is specifically requested for 10x Genomics "Flex" data (single cell fixed RNA profiling), where the requirement is to restrict the metrics to the first 50bp of R2. However the implementation should be generally applicable.

As part of the PR the QC metadata has also been updated to add a new "protocol summary" field, which is used within the QC pipeline to record an automatically generated text summary of the reads (and ranges) used by the protocol. This summary is then included in the HTML reports.