fls-bioinformatics-core / auto_process_ngs

Scripts and utilities for automatic processing & management of Illumina NGS sequencing data.
Other
9 stars 6 forks source link

Add integrity check on gzipped Fastqs in QC pipeline #854

Closed pjbriggs closed 10 months ago

pjbriggs commented 1 year ago

After recently encountering a set of compressed Fastqs where some were corrupted (i.e. failed when running gzip -t ... with messages about invalid compressed data--crc error and invalid compressed data--length error) which subsequently generated errors in the QC pipeline when run using run_qc.py, it seems like it might be a useful diagnostic to add an optional task in the QC pipeline to test the integrity of compressed Fastqs.

The task could be turned on for run_qc.py (where the provenance of the Fastqs might not be known) versus auto_process.py run_qc (where Fastqs have come from earlier processing steps).

pjbriggs commented 1 year ago

Possible example code for implementing such a check in Python: https://stackoverflow.com/a/41998710