Benchmark - Githubissues

LinearBoost / linearboost-classifier

LinearBoost Classifier is a rapid and accurate classification algorithm that builds upon a very fast, linear classifier.

MIT License

108 stars 11 forks source link

Hey, just saw your reddit post: https://reddit.com/r/MachineLearning/comments/1cqv5y4/r_our_new_classification_algorithm_outperforms/

Congrats on the algorithm, great to have things that run fast!

Wrt performing better than common boosted tree ensembles:

I recommend to read the following two paper that benchmark multiple classifiers against each other on many datasets.

https://arxiv.org/pdf/2207.08815
https://arxiv.org/pdf/2305.02997
You could even reuse TabZilla and see how your classifier compares as a function of data set characteristics such as size. It is very handy!

The mean 10-fold CV results are a function of the CV procedure and subject to some uncertainty. I would recommend to decide how you want to test whether the difference between classifier performances actually exists. This is a nice introduction of how to do it (or not).
Some boosted tree versions are optimized for speed (oftentimes using GPUs). Maybe include some for more comprehensive speed comparisons. SketchBoost is an example.

Excited to see where this goes!

Cheers

Hey, just saw your reddit post: https://reddit.com/r/MachineLearning/comments/1cqv5y4/r_our_new_classification_algorithm_outperforms/

Congrats on the algorithm, great to have things that run fast!

Wrt performing better than common boosted tree ensembles:

I recommend to read the following two paper that benchmark multiple classifiers against each other on many datasets.

https://arxiv.org/pdf/2207.08815

https://arxiv.org/pdf/2305.02997

You could even reuse TabZilla and see how your classifier compares as a function of data set characteristics such as size. It is very handy!

The mean 10-fold CV results are a function of the CV procedure and subject to some uncertainty. I would recommend to decide how you want to test whether the difference between classifier performances actually exists. This is a nice introduction of how to do it (or not).

Some boosted tree versions are optimized for speed (oftentimes using GPUs). Maybe include some for more comprehensive speed comparisons. SketchBoost is an example.

Excited to see where this goes!

Cheers

Hey!

Excellent suggestions. Especially the TabZilla one! I will keep this issue open until we go through them. Thanks for your comment!

Cheers

LinearBoost / linearboost-classifier

Benchmark #1