amueller / introduction_to_ml_with_python

Notebooks and code for the book "Introduction to Machine Learning with Python"
7.45k stars 4.57k forks source link

Problem with Boston Housing Data #163

Open sdempwolf opened 2 years ago

sdempwolf commented 2 years ago

hello, On Sep 28 2022 I was working with the Boston Housing data and the exercises in module 02 supervised-learning. We received a message that there was an ethical problem with the Boston Housing data and that scikit-learn was recommending a switch to the California Housing data, for which they provided links. I ended up modifying the mglearn/datasets.py file, adding the import line and a function load_extended_california(). This allows the rest of the code in the notebook to function as written with the California housing data.

from sklearn.datasets import fetch_california_housing

def load_extended_california(): housing = fetch_california_housing() X = housing.data

X = MinMaxScaler().fit_transform(housing.data)
X = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)
return X, housing.target
amueller commented 2 years ago

Hi! Yes, I was part of the discussion of making that change in sklearn. Since the book is using this dataset, the repo will continue to use that dataset. If I end up revising the book (somewhat unlikely at this point), I will replace the dataset.

rsrenner commented 1 year ago

Hi! Yes, I was part of the discussion of making that change in sklearn. Since the book is using this dataset, the repo will continue to use that dataset. If I end up revising the book (somewhat unlikely at this point), I will replace the dataset.

Hi Andreas, I love using your book & notebooks in my classes. However, I don't want to have to revert to sklearn <1.2. I tried just replacing the references to Boston housing dataset with California housing data, but unsuccessful. Can you please point me to the files where this change needs to occur, as I must be missing one somehow. Or, will this approach just not work?

amueller commented 1 year ago

Please update the mglearn library, that should solve the issue.