Closed junweima closed 2 months ago
Thanks for asking. The second would be the right one. The selection was redone due to some concerns that some criterion in the first selection could be slightly tree-friendly biased. The latest version of the paper (which links to the new suite ids) is available here: https://hal.science/hal-03723551
On the OpenML website, there are currently 2 versions of the same numerical regression datasets. Version 1 is from July 2022 (https://www.openml.org/search?type=study&study_type=task&id=297) and Version 2 is from Jan 2023 (https://www.openml.org/search?type=study&study_type=task&id=336).
In the paper, you described the numerical regression datasets as version 1 but it is version 2 in the github readme file. Which one should I use and what is the difference?