Requesting a "difficulty" column in the dataset characteristic table

EpistasisLab / pmlb

PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms.

MIT License

805 stars 135 forks source link

First of all, thanks for the datasets. Secondly, this is a feature request/suggestion.

I think it would help a lot if this dataset collection came with a difficulty measure for each dataset. The candidate that comes to my mind is the best performance provided by any solution so far. Or the average and standard deviation of all the solutions.

I understand the problems with such measures (for one, how to make a fair judgment) but at any rate, I just wanted to say that as a user, I'm having a hard time choosing which dataset to use. Or knowing if the performance of my solution is any good.

Thanks

EpistasisLab / pmlb

Requesting a "difficulty" column in the dataset characteristic table #169