CitrineInformatics / lolo

A random forest
Apache License 2.0
41 stars 12 forks source link

Pne 6240 optimize shapley #321

Closed mVenetos97 closed 1 month ago

mVenetos97 commented 1 month ago

Optimizing Shapley calculations to run faster. Two main changes:

BaggedModel.scala: The .shapley. method in BaggedModel is calling .shapley on each model in the ensemble and then summing the results. This approach is optimized by:

ModelNode.scala Small optimizations:

These two changes brought the time from ~30s down to ~7s for a dataset of 100 points. (tested locally)