tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1
stars
1
forks
source link
tiny-count: option for normalization by genomic hits, improvements in stats collection precision #301
The two normalization-by-hits options, by genomic hits and feature hits, can now be disabled both independently and in tandem. When either normalization step is disabled, two new stats are reported in summary stats in place of Mapped Reads:
Non-normalized Mapped Reads: the sum of assigned and unassigned reads according to the normalization config
Normalized Mapped Reads: the true read count
tiny-plot automatically uses the appropriate stat for calculating proportions in rule_charts and class_charts.
Internal Consistency Checks for Reported Stats
The counts reported in all output stats files are now thoroughly checked for internal consistency. Discrepancies are reported to console with a clear description rather than being treated as an error. If alignment stats and summary stats have internal disagreement, a checksum table is produced as a CSV file for further diagnosis. Care has been taken to ensure that the consistency checker can fail gracefully if any unforeseen exceptions are raised. This prevents counting outputs from being lost at the fault of the consistency checker. Consistency is checked after every run and is typically a very swift process (usually a fraction of a second). Nonetheless, a command line option has been added to tiny-count to allow this step to be turned off if needed. The following checks are performed for each library:
Internal consistency for all assigned/unassigned read/sequence counts in alignment stats and summary stats
The two normalization-by-hits options, by genomic hits and feature hits, can now be disabled both independently and in tandem. When either normalization step is disabled, two new stats are reported in summary stats in place of Mapped Reads:
tiny-plot automatically uses the appropriate stat for calculating proportions in rule_charts and class_charts.
Internal Consistency Checks for Reported Stats
The counts reported in all output stats files are now thoroughly checked for internal consistency. Discrepancies are reported to console with a clear description rather than being treated as an error. If alignment stats and summary stats have internal disagreement, a checksum table is produced as a CSV file for further diagnosis. Care has been taken to ensure that the consistency checker can fail gracefully if any unforeseen exceptions are raised. This prevents counting outputs from being lost at the fault of the consistency checker. Consistency is checked after every run and is typically a very swift process (usually a fraction of a second). Nonetheless, a command line option has been added to tiny-count to allow this step to be turned off if needed. The following checks are performed for each library:
Codebase Improvements
MergedStats classes have been mildly refactored to ensure that stats are complete and final before being validated and written to output files.
Closes #295