Should I train a new model ?

KarchinLab / 2020plus

Classifies genes as an oncogene, tumor suppressor gene, or as a non-driver gene by using Random Forests

Apache License 2.0

49 stars 17 forks source link

Ideally one would train an entire new model where silent mutations were not included to then apply it on additional data where they also weren't included. In general, scores will skew higher when no silent mutations are included in your data when scored used a model that was trained on data that contained silent mutations. However, as you noticed by the option, a reasonable workaround is to adjust what is considered a significant score by accounting for the fact that silent mutations are not included in the monte carlo simulations. This should help reduce potential biases, but ideally you should check the p-values and see if there are artificially large number of significant results for your data. If that is the case, then you may need to train a new model.

KarchinLab / 2020plus

Should I train a new model ? #19