alan-turing-institute / rds-course

Materials for Turing's Research Data Science course
https://alan-turing-institute.github.io/rds-course/
31 stars 13 forks source link

Iris Data Set has Eugenics Connotations....! #95

Closed LydiaFrance closed 2 years ago

LydiaFrance commented 2 years ago

The iris data set was originally published in the Annuals of Eugenics so it is probably best replaced with something else, particularly as it would have to be cited. Not only that, Fisher published it as a methodology framework to delineate desirable traits from data for the purpose of "improve human genetic stock" through sterlisation/eugenics programs.

It's discussed a little here, and an alternative data set is also suggested: https://armchairecology.blog/iris-dataset/

jack89roberts commented 2 years ago

Thanks for flagging it! 🙂

pafoster commented 2 years ago

In case of interest, here is an in-depth article in Nature Heredity about RA Fisher:

https://www.nature.com/articles/s41437-020-00394-6

jack89roberts commented 2 years ago

Thanks, looks like an interesting read! To be honest, even without being aware of potential issues around Fisher I didn't really want to use Iris in the first place as it's so ubiquitous/overused (but we were short of time and finding suitable learning datasets is tough). So I didn't need much convincing to swap it for something else.

pafoster commented 2 years ago

No problem -- I thought it would be good to add the article here, simply for anyone who is interested in learning more about the facts around the person, but also Statistics and Data Science.