GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
189 stars 24 forks source link

NDR Feature Update (Only to Review) #306

Closed andredsim closed 2 years ago

andredsim commented 2 years ago

DO NOT MERGE - This PR is just for review purposes to make it easier to isolate the NDR changes in the BambuManuScriptRevision branch.

This pull request involves several features related to the NDR and how it is calculated.

Added the returnModel parameter to isoreParameters. When TRUE the trained model will be appended to the metadata of the rcFile (metadata(se)$model). Users can then pass this model as an isoreParameter to defaultModels when a normal human default model will be preferred. Generally fit should be set to FALSE as well.

By default the NDR is no longer 0.1 and instead calculates a recommended NDR for the user based on the default 0.1 NDR on human datasets. This is calculated using the default model and is for cases where the user has poorly annotated genomes and therefore would generally need a more sensitive threshold.

The txScore is now always calculated for the default model and is stored as txScore.noFit. This is used to calculate the suggested NDR threshold and to recommend to users if the default model is likely to be performing better than the fitted model (can occur in cases where the reference annotations are very poor).

All transcripts which have a corresponding read class (and not just novel read classes) will be assigned an NDR instead of only novel transcripts. This means that users could run Bambu with an NDR of 1 to get all possible novel transcripts and filter the NDR manually before the quantification step.

The rcFile can be output when quant and discovery are set to false to make accessing a trained model easier.