Open artidoro opened 5 years ago
LIBSVM dataset is also commonly used in researches.
We have the following data sets that can be used as regression:
housing
taxi-fare
The following can be reformulated to use as a regression prediction:
adult
(predicting age from all the other variables)breast-cancer
(predict any feature)iris
(predict any feature)Rogan seems to have answered this question.
The work item is to replace the synthetic datasets w/ ones more representative of user datasets. Rogan has pointed out great ones we can use as replacements in our tests.
@justinormont The ones that Rogan pointed out are real datasets, breast-cancer dataset is from 1992.
Some regression tests rely on a machine generated regression dataset (Gaussian noise on top of a linear function of a vector input). The file was introduced by #937.
We should replace this dataset with a real dataset. Justin @justinormont suggested to find something from data.gov, for example predicting the SF employee pay: https://catalog.data.gov/dataset/employee-compensation-53987