EpistasisLab / pmlb

PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms.
https://epistasislab.github.io/pmlb/
MIT License
805 stars 135 forks source link

Add metadata to bupa, read description! #53

Closed daniel0710goldberg closed 4 years ago

daniel0710goldberg commented 4 years ago

Google collab notebook: https://colab.research.google.com/drive/1368j3Ug67AvZPVKMp7owVjA-JwNqL5V6#scrollTo=fXufmqGCh372

This dataset is confusing - the current target is a train/test selector. Most analysis done on this dataset (according to openML) uses the 6th feature, "Drinks", as the target. The PMLB dataset should be changed to reflect this. I changed the metadata.yaml file to reflect this change preemptively. Changed yaml to treat 6th feature, currently "Drinks", as new target, with 7th feature removed - this needs to be changed. I removed "Drinks" from features and made the "target" metadata fit that of "Drinks" so we don't need to edit the metadata again, beyond removing "TODO" and following description of previous target.

This publication regarding this dataset may be useful: https://www.richardsandesforsyth.net/pubs/JMRF_DiagnosingDisorder_PRL2016.pdf

daniel0710goldberg commented 4 years ago

I meant "BUPA" - sorry for the typo.

trangdata commented 4 years ago

Also, it seems like liver_disorder is a duplicate of this dataset.

trangdata commented 4 years ago

ref #54