Raschka-research-group / coral-cnn

Rank Consistent Ordinal Regression for Neural Networks with Application to Age Estimation
https://www.sciencedirect.com/science/article/pii/S016786552030413X
MIT License
341 stars 62 forks source link

Interpretation of the probability ouput #28

Closed wcm95 closed 4 years ago

wcm95 commented 4 years ago

Suppose there are 6 categories: 0, 1, 2, 3, 4, 5. The probability output for one sample is [0.8, 0.6, 0.55, 0.45, 0.1]. So the prediction result for this sample will be category 3. My question is, does this mean P(X=3) = P(X>2) - P(X>3) = 0.55- 0.45 = 0.1, the probability of the predicted category is only 0.1?

wcm95 commented 4 years ago

@rasbt Could you please help me with this question

rasbt commented 4 years ago

Yes, your reasoning seems to be correct here. Let me know if you have a follow-up question.

bneigher commented 3 years ago

@wcm95 why would the predicted category be 3? I'm a bit confused because you have a domain of 6, but the prediction vector has 5 categories. It would seem (if that's a typo) that the first category (.8) is the highest probability so wouldn't the correct prediction reading for that output be a 0?

rasbt commented 3 years ago

I think there are different things going on here :). @bneigher, you are right in case of a regular softmax activation output where you use the argmax. In the CORAL model, you treat each task as a different binary task P(X>0), P(X>1), ... . Since it is a monotone function, the earlier tasks will always have a higher probability than the later ones. In CORAL, the class label is predicted by summing the number of tasks for which the probability is greater than 0.5. So in this case

[0.8 -> 1, 0.6 -> 1, 0.55 -> 1, 0.45 ->0, 0.1 -> 0] = 1 + 1 + 1+ 0+ 0 = 3

bneigher commented 3 years ago

@rasbt got it! that makes a lot of sense thank you for clarifying that 🚀