(w02_t03) Nemenyi Test Unclear

automl-edu / AutoMLLecture

Lecture on Automated Machine Learning

Other

74 stars 20 forks source link

(w02_t03) Nemenyi Test Unclear #15

Open jakob-r opened 3 years ago

jakob-r commented 3 years ago

Following things should be clear

The test statistic q follows which distribution (suggest: Studentized Range Distribution with parameter k = number of algorithms bud I haven't found anything on the degrees of freedom)
- Does the Post-hoc Nemenyi test control the FWER? (I suppose yes)
How to derive the critical difference (suggest: q.alpha = qtukey(1 - 0.05, k, Inf) / sqrt(2L); cd.nemenyi = q.alpha * sqrt(k * (k + 1L) / (6L * n)))
I guess the test statistic q only consists of the absolute value of Rj1 - Rj2

UgurKap commented 3 years ago

From Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." The Journal of Machine Learning Research 7 (2006): 1-30:

Critical values q_\alpha are based on the Studentized range statistic divided by √2.

resim

I think this should be made more clear in Post-Hoc Test II page, as critical difference is actually what we are comparing against. So, we find a mean rank for each algorithm, and then connect them in the graph if their mean is less than the critical difference. If two algorithms are not connected, their performance is different.

In the slides, it is stated that lower rank can be considered better, but I think "lower" rank is an ambiguous term as it is more intuitive to think rank 1 is better than rank 2. Maybe it should say rank closer to 1 is the better algorithm.

UgurKap commented 3 years ago

I think the difference (or similarity?) between Nemenyi and Bonferroni-Dunn test should be explained in more detail.

Again from Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." The Journal of Machine Learning Research 7 (2006): 1-30:

The tests differ in the way they adjust the value of α to compensate for multiple comparisons. The Bonferroni-Dunn test (Dunn, 1961) controls the family-wise error rate by dividing α by the number of comparisons made (k−1, in our case). The alternative way to compute the same test is to calculate the CD using the same equation as for the Nemenyi test, but using the critical values for α/(k−1) (for convenience, they are given in Table 5(b)).

resim

mlindauer commented 3 years ago

Thanks for providing this valuable feedback.

@larskotthoff @berndbischl is this already addressed in the new slides of w02_t03? Or can/should we point to further material here?

larskotthoff commented 3 years ago

This is not addressed -- can we do this for the next iteration? It doesn't sound like it's super urgent.

mlindauer commented 3 years ago

Sure.