fls-bioinformatics-core / auto_process_ngs

Scripts and utilities for automatic processing & management of Illumina NGS sequencing data.
Other
9 stars 6 forks source link

Implement functionality for per-lane QC in QC pipeline #916

Closed pjbriggs closed 7 months ago

pjbriggs commented 7 months ago

Implements new functionality that enables the QC pipeline to run on a per-lane basis, even if the input Fastqs have been produced using --no-lane-splitting.

Updates include adding a task with the pipeline to split Fastqs by lane, and allowing the QC to be verified against an arbitrary lists of Fastqs (as the splitting results in a set of derived Fastqs which differ from those in the project).

For the auto_process.py run_qc command, there is a new configuration setting (qc.split_undetermined_fastqs): if this is set to True (the default) then Fastqs in the undetermined project will be split if there is no explicit lane number in their names. Setting the option to False disables this behaviour.

For the run_qc.py utility, the functionality is exposed in the run_qc.py utility via a new --split-fastqs-by-lane command line option (the qc.split_undetermined_fastqs setting is ignored in the utility).

The QC directory metadata now contains two new items: a list of Fastq names that the pipeline was run against (which will be used in preference to the list from the project's fastqs directory), and a flag to indicate whether the pipeline run with the Fastqs split by lanes (if the flag is set then a note is added to the QC report to indicate that this was the case).