ThrunGroup / FastForest

7 stars 0 forks source link

Find the datasets that will be used in paper #89

Open motiwari opened 2 years ago

motiwari commented 2 years ago

Need a few classification + regression ones

motiwari commented 2 years ago

The thesis https://orbi.uliege.be/bitstream/2268/170309/1/thesis.pdf does not have an explicit list of datasets. Could scan through it for useful ones.

A cursory look at review articles in recent years suggests that there isn't a set of canonical tasks for tree-based algorithms.

These sets of datasets can be useful: https://scikit-learn.org/stable/datasets/toy_dataset.html https://scikit-learn.org/stable/datasets/real_world.html https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

Ideally the algorithm is agnostic to the dataset and including more datasets is trivial.

motiwari commented 2 years ago

This one seems pretty popular too: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html