AI-sandbox / XGMix

13 stars 2 forks source link

My question about the normalization of confusion matrix #13

Closed dwuab closed 3 years ago

dwuab commented 3 years ago

In line 11 of Utils/visualization.py, the confusion matrix is calculated by cm = confusion_matrix(y, y_pred). According to sklearn's documentation, the j-th element of i-th row is the number of observation whose true label is i but is predicted to be of label j. A natural way to normalize the confusion matrix would therefore be normalization by row sums. However, in line 77 of Utils/visualization.py, the confusion matrix is normalized by the column sums: cm = cm/np.sum(cm, axis=0). This does not seem right. Am I wrong?

weekend37 commented 3 years ago

No, I actually agree with that. It's not wrong but normalizing by the sum of the true labels makes much more sense to me too. This has already been fixed and will be included in the new version which we will be releasing very soon.