LeoGrin / tabular-benchmark

448 stars 59 forks source link

Numerical Regression Datasets #23

Closed junweima closed 2 months ago

junweima commented 2 months ago

On the OpenML website, there are currently 2 versions of the same numerical regression datasets. Version 1 is from July 2022 (https://www.openml.org/search?type=study&study_type=task&id=297) and Version 2 is from Jan 2023 (https://www.openml.org/search?type=study&study_type=task&id=336).

In the paper, you described the numerical regression datasets as version 1 but it is version 2 in the github readme file. Which one should I use and what is the difference?

LeoGrin commented 2 months ago

Thanks for asking. The second would be the right one. The selection was redone due to some concerns that some criterion in the first selection could be slightly tree-friendly biased. The latest version of the paper (which links to the new suite ids) is available here: https://hal.science/hal-03723551