Is there any future plan for supporting classification benchmarks?

cavalab / srbench

A living benchmark framework for symbolic regression

https://cavalab.org/srbench/

GNU General Public License v3.0

203 stars 75 forks source link

Is there any future plan for supporting classification benchmarks? #48

Closed hengzhe-zhang closed 2 years ago

hengzhe-zhang commented 2 years ago

In 2014, a paper published in JMLR reported the results of more than 100+ classification algorithms on numerous classification benchmark datasets [1]. However, it seems that such a paper does not consider the genetic programming based methods, such as M4GP [2]. Consequently, is it possible to develop a classification benchmark to further boost the advancement of genetic programming and even the machine learning domain?

[1]. Fernández-Delgado M, Cernadas E, Barro S, et al. Do we need hundreds of classifiers to solve real world classification problems?[J]. The journal of machine learning research, 2014, 15(1): 3133-3181. [2]. La Cava W, Silva S, Danai K, et al. Multidimensional genetic programming for multiclass classification[J]. Swarm and evolutionary computation, 2019, 44: 260-272.

lacava commented 2 years ago

We have certainly thought about incorporating symbolic classification algorithms into our benchmarking. They are a bit less common in SR literature, but nonetheless I agree such a benchmark would be very useful. I could see it being an addition to this repo.

hengzhe-zhang commented 2 years ago

Are there any specific plans with respect to this matter? In my opinion, it seems all analysis scripts can be reused, and we only need to change experimental datasets to those classification datasets in PMLB database, and change those machine learning estimators to its classification counterpart.

hengzhe-zhang commented 2 years ago

By the way, I'm not sure about whether we should reuse the results reported in the previous large-scale benchmark paper. The classifiers used in that paper are rather old, and it doesn't include SOTA classifiers such as XGBoost and LightGBM. Consequently, it is questionable if it is necessary to use the existing results of that article. And even worse, some papers pointed out that results are obtained under a flawed experimental protocol [1], e.g., that paper uses test data to tune the hyper-parameter. Consequently, the results obtained by that article are not reliable.

[1]. Wainberg M, Alipanahi B, Frey B J. Are random forests truly the best classifiers?[J]. The Journal of Machine Learning Research, 2016, 17(1): 3837-3841.

athril commented 2 years ago

I don't think we currently plan to work on a large scale benchmarking of classifiers with specific focus on GP-based ones. Please notice that the datasets included in srbench are regression problems, PMLB covers over 165 classification problems as well. Nonetheless, you might be interested in taking a look at the following papers that cover more recent methods: https://biodatamining.biomedcentral.com/articles/10.1186/s13040-017-0154-4 https://www.worldscientific.com/doi/pdf/10.1142/9789813235533_0018 https://arxiv.org/abs/2107.06475

hengzhe-zhang commented 2 years ago

@athril Thank you for providing the DIGEN package. This is exactly what I am looking for, excellent work!