chriswbartley / monoensemble

High Performance Monotone Boosting and Random Forest Classification
http://monoensemble.readthedocs.io/en/latest/index.html
Other
5 stars 1 forks source link

for multi-class case predict_proba method does not return same number of probabilities as number of classes #1

Closed ravimaranganti closed 4 years ago

ravimaranganti commented 4 years ago

for multi-class case with n-classes the predict_proba method seems to return (n-1) probabilities. The probabilities also seem to be un-normalized so it is not possible to recover the probability of the missing class using the probabilities of the rest of the classes

chriswbartley commented 4 years ago

Thanks Ravi - Yes, the predict_proba returns different probabilities to sklearn: they are cumulative probabilities (I should make this more clear somewhere). For classes 1,2,3,4 the three values would be: [P(y<2), P(y<3),P(y<4)].

You can calculate the class probabilities by: P(y=1) = P(y<2) P(y=2)= P(y<3)-P(y<2) P(y=3) = P(y<4)-P(y<3) P(y=4) = 1 - P(p<4)

From memory, if you are planning on assigning a class, to retain global monotonicity, you need to use a consistent threshold on the cumulative probability, e.g. the 'lowest median' is used in predict() (ie the first C s.t. p(y<C)>0.5)