tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
This PR introduces a new selector for tiny-count: Mismatches. It is used for placing constraints on the edit distance between an alignment and the reference, and it is evaluated in Stage 2 after the Overlap selector. Users can specify ranges, lists, wildcards, and single values in this column.
Edit distance is determined from:
The NM tag, if present
The CIGAR string (I, D, and X operations) if the NM tag is not present in the first alignment
If the NM tag is present in the first alignment but missing from a subsequent alignment, then the subsequent alignment's edit distance assumes a default value of zero
The former function for producing alignment dictionaries, SAM_reader._parse_alignments(), has been converted to a standalone Cython class which utilizes pysam's Cython API. As a result, runtimes appear to be negligibly affected (~4-5% slower) rather than the 20-30% reduction measured while using pysam's Python API. This dedicated class is also responsible for accumulating alignments for decollapsed outputs, but delegates all other decollapsing responsibility to the Python-space SAM_reader class. I've made an effort to minimize the Cython surface area due to its complications with debugging.
Additionally:
The first sequence in each alignment file is checked to make sure that it contains the read sequence (error otherwise)
If the user elects to have decollapsed outputs produced, alignments are accumulated as pysam's AlignedSegmet objects rather than alignment dictionaries because they have a significantly smaller memory footprint
The column order of the Features Sheet (specifically the Overlap column) has been updated to match the order shown in the selection diagram
This PR introduces a new selector for tiny-count: Mismatches. It is used for placing constraints on the edit distance between an alignment and the reference, and it is evaluated in Stage 2 after the Overlap selector. Users can specify ranges, lists, wildcards, and single values in this column.
Edit distance is determined from:
The former function for producing alignment dictionaries, SAM_reader._parse_alignments(), has been converted to a standalone Cython class which utilizes pysam's Cython API. As a result, runtimes appear to be negligibly affected (~4-5% slower) rather than the 20-30% reduction measured while using pysam's Python API. This dedicated class is also responsible for accumulating alignments for decollapsed outputs, but delegates all other decollapsing responsibility to the Python-space SAM_reader class. I've made an effort to minimize the Cython surface area due to its complications with debugging.
Additionally:
Closes #296