Waikato / meka

Multi-label classifiers and evaluation procedures using the Weka machine learning framework.
http://waikato.github.io/meka/
GNU General Public License v3.0
200 stars 76 forks source link

Multitarget - same metrics for several different algorithms #71

Open thiagonazareth opened 3 years ago

thiagonazareth commented 3 years ago

Dear, good night. I'm sorry for the English, I'm using a translator. I am using MEKA for my master's work, which is using machine learning to predict student retention in higher education, and I came across the following situation. Using the GUI interface and running Meka Explorer to test the multitarget algorithms, the results of Hamming score and Accuracy (per label) are the same for several different algorithms. I used two multitarget datasets available in the data MEKA folder, the thyroid-L7.arff and solar_flare.arff, and the same behavior of equal metrics for different algorithms occurs.

Using meka.classifiers.multitarget.CC, meka.classifiers.multitarget.BCC, meka.classifiers.multitarget.CCp and meka.classifiers.multitarget.CR, all running with J48 and NaiveBayes, with default parameters, present the same results as Hamming score, Exact match, Hamming loss, ZeroOne loss, Levenshtein distance and Accuracy (per label).

I ran the experiments on both Mac OSX and Ubuntu.

The result is this for all the algorithms and variations mentioned above, using the thyroid-L7.arff dataset.

N (test) 3119 L 7 Hamming score 0.281 Exact match 0 Hamming loss 0.719 ZeroOne loss 1 Levenshtein distance 0.719 Label indices [0 1 2 3 4 5 6] Accuracy (per label) [0.002 0.023 0.006 0.939 0.013 0.001 0.980]

thiagonazareth commented 3 years ago

I found the problem and created the pull request to fix it

jmread commented 3 years ago

It is not necessarily a problem to get the same results for different algorithms. But if I understand, according to your proposed change, it looks like this may be a result of the posterior distribution information not being copied into the right place where it is later accessed under evaluation metrics. Is that correct?

thiagonazareth commented 3 years ago

I agree that it is not necessarily a problem to get the same results for different algorithms, but it caught my attention to use very different algorithms, varying several input parameters and they bring the same result. The problem I found was the following: the size of the array that stores the results (distributionForInstance method) is doubled, to store the result of the label from position i and in position i + N store probability information from label i. Probability information has not yet been implemented for MT classifiers. So, Arrays.copyOfRange (y, L, L * 2) takes information that is always of value 1. The correct thing is to do Arrays.copyOfRange (y, 0, L);

jmread commented 3 years ago

The reason for the doubling of the array is to make space to store the probability information from the posterior, P(y[j] = y_max[j] | x) where y_max[j] is the most likely value. The first part of the array (up to L) is used to store the y_max[j] directly for each label. This is not needed in the standard multi-label case, we just store P(y[j] = 1) instead, because y_max[j] can be inferred directly (there are only two possible values -- either 0 or 1). In the multi-target case, this was included mainly for display/debug purposes, and does not represent the full distribution anyway. I guess this is what you mean by "Probability information not yet implemented for MT classifiers". I agree that the fix you propose makes sense. It seems that in this part of the code information is missing from 0...L altogether, which shouldn't be the case. Probably it also needs to be accompanied by a unit test for example on thyroid-L7.arff (as you used above to demonstrate the issue). Are you able to put your experiment into a small unit test?