MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

tiny-count: option for normalization by genomic hits, improvements in stats collection precision #301

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

The two normalization-by-hits options, by genomic hits and feature hits, can now be disabled both independently and in tandem. When either normalization step is disabled, two new stats are reported in summary stats in place of Mapped Reads:

tiny-plot automatically uses the appropriate stat for calculating proportions in rule_charts and class_charts.

Internal Consistency Checks for Reported Stats

The counts reported in all output stats files are now thoroughly checked for internal consistency. Discrepancies are reported to console with a clear description rather than being treated as an error. If alignment stats and summary stats have internal disagreement, a checksum table is produced as a CSV file for further diagnosis. Care has been taken to ensure that the consistency checker can fail gracefully if any unforeseen exceptions are raised. This prevents counting outputs from being lost at the fault of the consistency checker. Consistency is checked after every run and is typically a very swift process (usually a fraction of a second). Nonetheless, a command line option has been added to tiny-count to allow this step to be turned off if needed. The following checks are performed for each library:

Codebase Improvements

MergedStats classes have been mildly refactored to ensure that stats are complete and final before being validated and written to output files.

Closes #295

taimontgomery commented 1 year ago

Tested successfully with ram1 and Lib303 data.