Waikato / meka

Multi-label classifiers and evaluation procedures using the Weka machine learning framework.
http://waikato.github.io/meka/
GNU General Public License v3.0
200 stars 76 forks source link

Possible bug in Classifier Trellis’ weight calculation #43

Closed rgson closed 7 years ago

rgson commented 7 years ago

When a Classifier Trellis is set to reorder the chain (i.e. when the number of chain iterations is set to >0 through setChainIterations), it calculates label weights based on a matrix of mutual information values.

The mutual information matrix is created in StatUtils.I(double P[][]), which only fills half of the matrix (a valid optimization due to the symmetrical properties of mutual information). However, when the matrix is used to calculate the weight, it is used as though it was completely full. The calculation is weight += I[pj][j_];, even though I[pj][j_] will incorrectly yield 0 for all pj > j_.

I ran a test to confirm that the result varies between a half-filled matrix and a completely filled one. Evidently, the outcome is affected:

# Half matrix                                                      # Full matrix

== Evaluation Info                                                 == Evaluation Info

Classifier                     meka.classifiers.multilabel.CT      Classifier                     meka.classifiers.multilabel.CT
Options                        [-H, -1, -L, 1, -X, Ibf, -Is, 1,    Options                        [-H, -1, -L, 1, -X, Ibf, -Is, 1,
Additional Info                0.009                            |  Additional Info                0.006
Dataset                        Music                               Dataset                        Music
Number of labels (L)           6                                   Number of labels (L)           6
Type                           ML-CV                               Type                           ML-CV
Threshold                      0.5                                 Threshold                      0.5
Verbosity                      3                                   Verbosity                      3

== Predictive Performance                                          == Predictive Performance

Number of test instances (N)                                       Number of test instances (N)
Accuracy                       0.578                            |  Accuracy                       0.582
Jaccard index                  0.578                            |  Jaccard index                  0.582
Hamming score                  0.801                            |  Hamming score                  0.813
Exact match                    0.333                            |  Exact match                    0.328
Jaccard distance               0.422                            |  Jaccard distance               0.418
Hamming loss                   0.199                            |  Hamming loss                   0.187
ZeroOne loss                   0.667                            |  ZeroOne loss                   0.672
Harmonic score                 0.708                            |  Harmonic score                 0.706
One error                      0.361                            |  One error                      0.297
Rank loss                      0.221                            |  Rank loss                      0.224
Avg precision                  0.657                            |  Avg precision                  0.669
Log Loss (lim. L)              0.357                            |  Log Loss (lim. L)              0.335
Log Loss (lim. D)              1.272                            |  Log Loss (lim. D)              1.193
F1 (micro averaged)            0.687                            |  F1 (micro averaged)            0.697
F1 (macro averaged by example) 0.66                             |  F1 (macro averaged by example) 0.665
F1 (macro averaged by label)   0.677                            |  F1 (macro averaged by label)   0.674
AUPRC (macro averaged)         0.567                            |  AUPRC (macro averaged)         0.573
AUROC (macro averaged)         0.763                            |  AUROC (macro averaged)         0.767
Curve Data                                                         Curve Data
Macro Curve Data                                                   Macro Curve Data
Micro Curve Data                                                   Micro Curve Data
Label indices                  [     0     1     2     3     4     Label indices                  [     0     1     2     3     4 
Accuracy (per label)           [ 0.784 0.682 0.787 0.904 0.824  |  Accuracy (per label)           [ 0.784 0.764 0.787 0.904 0.818 
Empty labelvectors (predicted) 0.005                               Empty labelvectors (predicted) 0.005
Label cardinality (predicted)  1.954                            |  Label cardinality (predicted)  1.829
Levenshtein distance           0.192                            |  Levenshtein distance           0.181

My proposed solution is to simply fill the entire matrix. The performance impact is negligible and it avoid any accidental use of the empty half, even for classifiers other than the Trellis.

Similar issues may exist for other matrices created in the StatUtils class. Since I don't have any test cases to confirm it, I've not touched them.

jmread commented 7 years ago

Thanks for finding this. Indeed it seems to be a minor glitch, and filling the whole matrix should fix it as you suggest.