Add metadata to bupa, read description!

daniel0710goldberg commented 4 years ago

Google collab notebook: https://colab.research.google.com/drive/1368j3Ug67AvZPVKMp7owVjA-JwNqL5V6#scrollTo=fXufmqGCh372

This dataset is confusing - the current target is a train/test selector. Most analysis done on this dataset (according to openML) uses the 6th feature, "Drinks", as the target. The PMLB dataset should be changed to reflect this. I changed the metadata.yaml file to reflect this change preemptively. Changed yaml to treat 6th feature, currently "Drinks", as new target, with 7th feature removed - this needs to be changed. I removed "Drinks" from features and made the "target" metadata fit that of "Drinks" so we don't need to edit the metadata again, beyond removing "TODO" and following description of previous target.

This publication regarding this dataset may be useful: https://www.richardsandesforsyth.net/pubs/JMRF_DiagnosingDisorder_PRL2016.pdf

daniel0710goldberg commented 4 years ago

I meant "BUPA" - sorry for the typo.

trangdata commented 4 years ago

Also, it seems like liver_disorder is a duplicate of this dataset.

trangdata commented 4 years ago

ref #54

EpistasisLab / pmlb

Add metadata to bupa, read description! #53