Closed ravimaranganti closed 4 years ago
Thanks Ravi - Yes, the predict_proba returns different probabilities to sklearn: they are cumulative probabilities (I should make this more clear somewhere). For classes 1,2,3,4 the three values would be: [P(y<2), P(y<3),P(y<4)].
You can calculate the class probabilities by: P(y=1) = P(y<2) P(y=2)= P(y<3)-P(y<2) P(y=3) = P(y<4)-P(y<3) P(y=4) = 1 - P(p<4)
From memory, if you are planning on assigning a class, to retain global monotonicity, you need to use a consistent threshold on the cumulative probability, e.g. the 'lowest median' is used in predict() (ie the first C s.t. p(y<C)>0.5)
for multi-class case with n-classes the predict_proba method seems to return (n-1) probabilities. The probabilities also seem to be un-normalized so it is not possible to recover the probability of the missing class using the probabilities of the rest of the classes