Optimised tree growing method

aehrc / VariantSpark

machine learning for genomic variants

Other

141 stars 45 forks source link

I recommend the following improvement to VariantSpark Random Forest importance analysis.

Compute and write importance score to a file after building every 1000 tree.
Automatically identify when enough tree has been built. If implementing the first suggestion then we can compare importance score at each step (1000 trees built) with the importance scores computed in the previous step. if little change has happened then we can stop building more trees.
Frequently (every -rbs tree) dump models (built trees) to disk and allowing to integrate previously built models in a new run. If the process crash half way produced model can be used in the next run.

aehrc / VariantSpark