CitrineInformatics / lolo

A random forest
Apache License 2.0
41 stars 12 forks source link

Uncertainty calibration default true #263

Closed bfolie closed 2 years ago

bfolie commented 2 years ago

Sets uncertaintyCalibration to have a default value of true for RandomForest, Bagger, and MultiTaskBagger*. This is because the default uncertainty for the resulting predictions is observational uncertainty, which makes the most sense as a recalibrated bootstrap standard deviation, but that recalibration is not done if uncertaintyCalibration = false. Therefore, our defaults led to nonintuitive behavior.

The rescale ratio can sometimes be Infinity if the number of training data and bags are small. Since we have two tests that run thousands of trials and check that sigma is positive, now that uncertaintyCalibration is true it sometimes runs into that situation and throws an error. I have made is so that ratio is set to 1.0 in this scenario.

Since I was changing the rescale ratio logic, I took the opportunity to consolidate the BaggerHelper code, which was duplicated in `Bagger. This resolves #212.

*The default is still false for ExtraRandomTrees, since this learner disables the bootstrap by default.