EpistasisLab / pmlb

PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms.
https://epistasislab.github.io/pmlb/
MIT License
805 stars 135 forks source link

Add metadata for wine_quality_white #38

Closed trangdata closed 4 years ago

trangdata commented 4 years ago

Work in progress.

Add metadata for wine_quality_white.

trangdata commented 4 years ago

There are a few rows in the original source that is not in the dataset (see here).

In general, I suspect a lot of other checks will be similar as well (e.g. minor floating-point differences between the dataframes). What should we do in these cases? Should we add back these rows? @weixuanfu @lacava

trangdata commented 4 years ago

@weixuanfu Since we agree on adding the rows back in, could you help me do that so we can merge, please? 🙏

weixuanfu commented 4 years ago

I will do that next week with adding an parameter (maybe called 'allow_na') in fetch_data to control it to export dataframe with or without NAs.

lacava commented 4 years ago

I will do that next week with adding an parameter (maybe called 'allow_na') in fetch_data to control it to export dataframe with or without NAs.

I like this idea!