MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Plotter: improvements and options for handling min and/or max lengths in len_dist plots #194

Closed AlexTate closed 2 years ago

AlexTate commented 2 years ago

Improvements and options have been added for managing len_dist plot bounds. On the command line the min and/or max (first/last) lengths can be optionally specified with:

If either is unspecified, the unspecified bound is determined from the data's bounds on a per-subtype basis (i.e. the "Mapped" subtype bounds are determined separately from the "Assigned" subtype). Bounds, whether calculated or specified, are fixed across all plots for each subtype.

Run Config entries have been added for user specification:

If either of these values are unassigned, the workflow will first fall back to the corresponding entries for fastp (length_required and length_limit). These fastp values are also optional, so if they too are unspecified, then the workflow will not pass corresponding values on the command line and Plotter will default to determining bounds as described above.

Additionally:

The following demonstrates the new xtick label crowding mitigation (NOTE: see lower comments on this PR; the first plot is inaccurate): mapped_len_dist_20-30 mapped_len_dist_15-35 mapped_len_dist_15-60

Closes #191

AlexTate commented 2 years ago

Marking this PR as a draft.

As can be seen in the current plot for lengths 20-30, there is more work that needs to be done in properly subsetting the range while maintaining the original proportions. There is also room for improvement in the implementation of the new xtick label crowding mitigation

AlexTate commented 2 years ago

The above concerns have been addressed. PR is ready to be merged. Below is the corrected plot for lengths 20-30 which now shows correct proportions: mapped_len_dist_20-30