MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Bugfix: strict interval matching fixed #149

Closed AlexTate closed 2 years ago

AlexTate commented 2 years ago

Closes #148

Additionally, in some cases subintervals of discontinuous features can wind up being defined adjacent to each other after processing. This happens when the adjacent intervals are defined under unique feature IDs but share the same root parent; this results in two side-by-side intervals with the root parent's feature ID.

StepVector.get_steps(merge_values=True) will remedy this by merging the adjacent intervals but there are two issues with this:

  1. Merging is performed by value. Now that start/stop coordinates are included in the feature record tuple, which is what is stored in the StepVector, the adjacent tuples will contain their subinterval coordinates and will therefore be "unique" and thus left unmerged.
  2. On-the-fly merging happens on every interval -> overlapping features lookup. It isn't particularly expensive but it's unnecessary work.

Now, these adjacent subintervals are merged just once before storing them in the StepVector so that on-the-fly merging is no longer necessary.

AlexTate commented 2 years ago

Note: the commit history and changed files list is excessive in this PR because branch issue-148 was derived from issue-134, which is waiting to be merged with master. These lists are determined by comparing to the master branch.

After the PR for issue-134 is merged, I'll toggle the branch comparison to clean up the lists for this PR

AlexTate commented 2 years ago

Toggled, PR is good to go