This dataset is confusing - the current target is a train/test selector. Most analysis done on this dataset (according to openML) uses the 6th feature, "Drinks", as the target. The PMLB dataset should be changed to reflect this. I changed the metadata.yaml file to reflect this change preemptively.
Changed yaml to treat 6th feature, currently "Drinks", as new target, with 7th feature removed - this needs to be changed.
I removed "Drinks" from features and made the "target" metadata fit that of "Drinks" so we don't need to edit the metadata again, beyond removing "TODO" and following description of previous target.
Google collab notebook: https://colab.research.google.com/drive/1368j3Ug67AvZPVKMp7owVjA-JwNqL5V6#scrollTo=fXufmqGCh372
This dataset is confusing - the current target is a train/test selector. Most analysis done on this dataset (according to openML) uses the 6th feature, "Drinks", as the target. The PMLB dataset should be changed to reflect this. I changed the metadata.yaml file to reflect this change preemptively. Changed yaml to treat 6th feature, currently "Drinks", as new target, with 7th feature removed - this needs to be changed. I removed "Drinks" from features and made the "target" metadata fit that of "Drinks" so we don't need to edit the metadata again, beyond removing "TODO" and following description of previous target.
This publication regarding this dataset may be useful: https://www.richardsandesforsyth.net/pubs/JMRF_DiagnosingDisorder_PRL2016.pdf