MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

tiny-count: new selector: Mismatches #298

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

This PR introduces a new selector for tiny-count: Mismatches. It is used for placing constraints on the edit distance between an alignment and the reference, and it is evaluated in Stage 2 after the Overlap selector. Users can specify ranges, lists, wildcards, and single values in this column.

Edit distance is determined from:

The former function for producing alignment dictionaries, SAM_reader._parse_alignments(), has been converted to a standalone Cython class which utilizes pysam's Cython API. As a result, runtimes appear to be negligibly affected (~4-5% slower) rather than the 20-30% reduction measured while using pysam's Python API. This dedicated class is also responsible for accumulating alignments for decollapsed outputs, but delegates all other decollapsing responsibility to the Python-space SAM_reader class. I've made an effort to minimize the Cython surface area due to its complications with debugging.

Additionally:

Closes #296

taimontgomery commented 1 year ago

Tested with ram1 data and 0, 1, or 3 mutations introduced into the genome fa and using Mismatch selector.