tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1
stars
1
forks
source link
AlignmentReader and tiny-config: produce more helpful errors for empty sample files #334
Currently, if empty alignment files are provided to tiny-count, Pysam will throw a StopIteration exception when the AlignmentReader class attempts to read the first alignment. The exception doesn't have an accompanying explanation so the cause isn't immediately clear.
Traceback (most recent call last):
[...]
File "[...]/tiny/rna/counter/hts_parsing.py", line 97, in _gather_metadata
first_aln = next(reader.head(1))
StopIteration
This can happen when fastp doesn’t retain sufficient reads or when bowtie doesn’t generate any alignments.
The SamplesSheet class will be updated to check for empty SAM/BAM files during both pipeline and standalone runs of tiny-count, and empty FASTQ files at pipeline startup. SAM/BAM files are only considered valid if they have at least one parseable alignment; checking file size is insufficient because these files might contain header data but no alignments. For FASTQ files, file size check is sufficient. Placing this validation step in the SamplesSheet class allows for all problem files to be reported at once. For due diligence, the AlignmentReader class will also be updated.
Currently, if empty alignment files are provided to tiny-count, Pysam will throw a StopIteration exception when the AlignmentReader class attempts to read the first alignment. The exception doesn't have an accompanying explanation so the cause isn't immediately clear.
This can happen when fastp doesn’t retain sufficient reads or when bowtie doesn’t generate any alignments.
The SamplesSheet class will be updated to check for empty SAM/BAM files during both pipeline and standalone runs of tiny-count, and empty FASTQ files at pipeline startup. SAM/BAM files are only considered valid if they have at least one parseable alignment; checking file size is insufficient because these files might contain header data but no alignments. For FASTQ files, file size check is sufficient. Placing this validation step in the SamplesSheet class allows for all problem files to be reported at once. For due diligence, the AlignmentReader class will also be updated.