Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

TNanukem / paper_implementations

This repository holds my paper implementations made for my studies and my content production

32 stars 17 forks source link

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms #3

Open GeorgeBatch opened 2 years ago

GeorgeBatch commented 2 years ago

Hi,

I just started looking into statistical tests for comparing algorithm performance. Thank you very much for posting your implementation for "Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms" (Dietterich, 1998).

I was wondering if you had a look at the paper that pretty much followed it and proposed some improvements "Inference for the Generalization Error" (Nadeau and Bengio, 1999)? If yes, did you see any code for it?

Many thanks, George

TNanukem commented 2 years ago

Hi George,

No, I haven't read this paper yet. However, I would be happy to do an implementation of the two methods proposed on the paper in the next three or four weeks, I already added it to my backlog.

GeorgeBatch commented 2 years ago

Hi Tiago,

Thanks! I am currently trying to understand whether there is a method that the community agreed upon for comparing algorithms when only one dataset is available. I will let you know here what other works I find and if a consensus was reached.

George

TNanukem commented 1 year ago

Hey George,

I've gave the paper a first read, I will probably implement it in the next weeks.

About your question, I looked at the connected papers for the paper you suggested and I think that there is not a consensus, even tough I think the most common approach is to compare the cross-validation mean.

Here are some papers I found that can be interesting, I aim to take a look at them as well to see if I can craft a post about this subject:

https://proceedings.neurips.cc/paper/2003/file/e82c4b19b8151ddc25d4d93baf7b908f-Paper.pdf https://link.springer.com/chapter/10.1007/978-3-540-24775-3_3 https://epubs.siam.org/doi/abs/10.1137/1.9781611972788.54

If you want to see the connected papers graph I used: https://www.connectedpapers.com/main/8010e480ad33f6f95e80f7361ba8928f07f80a13/Inference-for-the-Generalization-Error/graph

GeorgeBatch commented 1 year ago

Hi Tiago,

Thank you! Yes, it seems that no good method was found.

On the other hand, if you have multiple independent datasets, you can compare the methods: "Statistical Comparisons of Classifiers over Multiple Data Sets" 2006 (http://jmlr.org/papers/v7/demsar06a.html)

The authors recommend:

Wilcoxon Signed Rank test for comparing two classifiers
Friedman test with some post-hoc adjustments for comparing more classifiers

Hopefully, you find it useful. I'll be waiting for your blog post.

Best wishes, George