bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

Improve memory footprint #58

Closed fruce-ki closed 5 years ago

fruce-ki commented 6 years ago

Each bootstrap iteration currently returns five columns, each the length of the transcriptome annotation, from which summary statistics are calculated at the end. These fields are needed in order to report mean, median, variance, max and min values for the p-value and the Dprop.

However the frequency of DTU occurrence does not require all these metrics and could easily be accommodated by a single vector of incrementing counters. This would improve memory efficiency (and possibly result in some speed gain), simplify code and simplify output. Or it could be added as a runtime option while retaining the descriptive statistics from the bootstraps in case someone does want to look at them.

These metrics give a measure of how noisy each result is. I am generally satisfied with just the frequency, but these additional metrics could differentiate between marginal and highly noisy fails for the same frequency. Although probably highly noisy data will result in low frequencies to begin with.

fruce-ki commented 5 years ago

This is now addressed with commit ddf3b77 . lean mode is active by default and will omit all the bootstrap info except for the frequency of DTU across the iterations. Deactivating lean mode will produce all the familiar bootstrap columns, at the expense of significant memory footprint that limits scaleability.