MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

AlignmentReader and tiny-config: produce more helpful errors for empty sample files #334

Closed AlexTate closed 5 months ago

AlexTate commented 5 months ago

Currently, if empty alignment files are provided to tiny-count, Pysam will throw a StopIteration exception when the AlignmentReader class attempts to read the first alignment. The exception doesn't have an accompanying explanation so the cause isn't immediately clear.

Traceback (most recent call last):

    [...]

    File "[...]/tiny/rna/counter/hts_parsing.py", line 97, in _gather_metadata
        first_aln = next(reader.head(1))
StopIteration

This can happen when fastp doesn’t retain sufficient reads or when bowtie doesn’t generate any alignments.

The SamplesSheet class will be updated to check for empty SAM/BAM files during both pipeline and standalone runs of tiny-count, and empty FASTQ files at pipeline startup. SAM/BAM files are only considered valid if they have at least one parseable alignment; checking file size is insufficient because these files might contain header data but no alignments. For FASTQ files, file size check is sufficient. Placing this validation step in the SamplesSheet class allows for all problem files to be reported at once. For due diligence, the AlignmentReader class will also be updated.