MichelleLochner / supernova-machine

0 stars 1 forks source link

Thresholds and multiclass predict_proba #8

Open mkerrwinter opened 9 years ago

mkerrwinter commented 9 years ago

I've found another place where thresholding is important. When applying the AdaBoost predict_proba function (max_ml_algorithms line 197 and onwards) with multiple classes (currently in our case classes 1, 2 and 3) it's probabilities are very close. Typically the probabilities for [class1, class2, class3] are like [0.35, 0.32, 0.33]. For the Random Forest the class1 probabilities are more in the range 0.7-0.99 so less of an issue. But with 3 classes a P=0.5 threshold seems even more dodgy. I also have no idea how these DT based algorithms calculate probabilities. I'll look into that.

MichelleLochner commented 9 years ago

Thanks Max. That probability prediction would be useful. Most algorithms can only really compare two classes with each other and do some kind of procedure of iterating each class against the remaining ones to compute probabilities. Definitely worth checking!

On Wed, Feb 25, 2015 at 3:23 PM, mkerrwinter notifications@github.com wrote:

I've found another place where thresholding is important. When applying the AdaBoost predict_proba function (max_ml_algorithms line 197 and onwards) with multiple classes (currently in our case classes 1, 2 and 3) it's probabilities are very close. Typically the probabilities for [class1, class2, class3] are like [0.35, 0.32, 0.33]. For the Random Forest the class1 probabilities are more in the range 0.7-0.99 so less of an issue. But with 3 classes a P=0.5 threshold seems even more dodgy. I also have no idea how these DT based algorithms calculate probabilities. I'll look into that.

— Reply to this email directly or view it on GitHub https://github.com/MichelleLochner/supernova-machine/issues/8.