Open julian-berger opened 1 month ago
Hey, just saw your reddit post: https://reddit.com/r/MachineLearning/comments/1cqv5y4/r_our_new_classification_algorithm_outperforms/
Congrats on the algorithm, great to have things that run fast!
Wrt performing better than common boosted tree ensembles:
- I recommend to read the following two paper that benchmark multiple classifiers against each other on many datasets.
- https://arxiv.org/pdf/2207.08815
- https://arxiv.org/pdf/2305.02997
- You could even reuse TabZilla and see how your classifier compares as a function of data set characteristics such as size. It is very handy!
- The mean 10-fold CV results are a function of the CV procedure and subject to some uncertainty. I would recommend to decide how you want to test whether the difference between classifier performances actually exists. This is a nice introduction of how to do it (or not).
- Some boosted tree versions are optimized for speed (oftentimes using GPUs). Maybe include some for more comprehensive speed comparisons. SketchBoost is an example.
Excited to see where this goes!
Cheers
Hey!
Excellent suggestions. Especially the TabZilla one! I will keep this issue open until we go through them. Thanks for your comment!
Cheers
Hey, just saw your reddit post: https://reddit.com/r/MachineLearning/comments/1cqv5y4/r_our_new_classification_algorithm_outperforms/
Congrats on the algorithm, great to have things that run fast!
Wrt performing better than common boosted tree ensembles:
The mean 10-fold CV results are a function of the CV procedure and subject to some uncertainty. I would recommend to decide how you want to test whether the difference between classifier performances actually exists. This is a nice introduction of how to do it (or not).
Some boosted tree versions are optimized for speed (oftentimes using GPUs). Maybe include some for more comprehensive speed comparisons. SketchBoost is an example.
Excited to see where this goes!
Cheers