Closed gianfilippo closed 4 years ago
Overfitting is certainly possible for machine learning. Though for adaBoost, even though the training's max number of trees is 500, the predictor (https://github.com/bioinform/somaticseq/blob/master/r_scripts/ada_model_predictor.R) script only uses the first 300 trees to reduce the likelihood of overfitting (n.iter=300).
The Ti/Tv ratio is mostly for germline SNPs. For somatic mutations, the mutation profiles depend on the tumor type. It can be ~1 for many tumors: https://www.biostars.org/p/104473/ and https://www.nature.com/articles/nature12477.
thanks for the links......very useful
Hi,
I looked at some basic results stats and the estimated Ti/Tv is a little above 1, well below what I understand is the expected ~3.0 for WES. That should suggest that i have a plenty of FPs.
Going back to the model, I see that the Train Error is always 0. I am attaching one example output file for one of the samples.
Is the model overfitting ? what should I expect as errors ?
Thanks model.out.txt