ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.14k stars 12.91k forks source link

Chapter 2. Transformation Pipelines. Value Error #619

Closed mattbche closed 3 years ago

mattbche commented 3 years ago

I tried to run the transformation pipeline in Chapter 2 with no success. I do not know why I am getting a value error. Any help would be appreciated.

    from sklearn.impute import SimpleImputer
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler

    num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),
        ('attribs_adder',CombinedAttributesAdder()), 
        ('std_scaler', StandardScaler()),
    ])

    housing_num_tr = num_pipeline.fit_transform(w)

And here is the result:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-47-3a3099ac60f9> in <module>
      4 num_pipeline = Pipeline([(('imputer'), SimpleImputer(strategy="median")),('attribs_adder',CombinedAttributesAdder()),('std_scaler', StandardScaler()),])
      5 
----> 6 housing_num_tr = num_pipeline.fit_transform(w)

~/env/lib/python3.6/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
    376         """
    377         fit_params_steps = self._check_fit_params(**fit_params)
--> 378         Xt = self._fit(X, y, **fit_params_steps)
    379 
    380         last_step = self._final_estimator

~/env/lib/python3.6/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params_steps)
    305                 message_clsname='Pipeline',
    306                 message=self._log_message(step_idx),
--> 307                 **fit_params_steps[name])
    308             # Replace the transformer of the step with the fitted
    309             # transformer. This is necessary when loading the transformer

~/env/lib/python3.6/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
    350 
    351     def __call__(self, *args, **kwargs):
--> 352         return self.func(*args, **kwargs)
    353 
    354     def call_and_shelve(self, *args, **kwargs):

~/env/lib/python3.6/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
    752     with _print_elapsed_time(message_clsname, message):
    753         if hasattr(transformer, 'fit_transform'):
--> 754             res = transformer.fit_transform(X, y, **fit_params)
    755         else:
    756             res = transformer.fit(X, y, **fit_params).transform(X)

~/env/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    697         if y is None:
    698             # fit method of arity 1 (unsupervised transformation)
--> 699             return self.fit(X, **fit_params).transform(X)
    700         else:
    701             # fit method of arity 2 (supervised transformation)

<ipython-input-43-6189961e6df8> in transform(self, X, y)
     13         population_per_household = X[:, population_ix] / X[:, household_ix]
     14         if self.add_bedrooms_per_room:
---> 15             bedrooms_per_room = X[:,bedrooms_ix] / X[:rooms_ix]
     16             return np.c_[X,rooms_per_household, population_per_household, bedrooms_per_room]
     17         else:

ValueError: operands could not be broadcast together with shapes (20640,) (3,13) 
ageron commented 3 years ago

Hi @mattbche,

Thanks for your question. It looks like there's a typo in your definition of the CombinedAttributesAdder class: there's a comma missing on the line of the error. Instead of:

bedrooms_per_room = X[:,bedrooms_ix] / X[:rooms_ix]

It should be:

bedrooms_per_room = X[:,bedrooms_ix] / X[:, rooms_ix]

Here's the full class:

from sklearn.base import BaseEstimator, TransformerMixin

# column index
rooms_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
    def __init__(self, add_bedrooms_per_room=True): # no *args or **kargs
        self.add_bedrooms_per_room = add_bedrooms_per_room
    def fit(self, X, y=None):
        return self  # nothing else to do
    def transform(self, X):
        rooms_per_household = X[:, rooms_ix] / X[:, households_ix]
        population_per_household = X[:, population_ix] / X[:, households_ix]
        if self.add_bedrooms_per_room:
            bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix]
            return np.c_[X, rooms_per_household, population_per_household,
                         bedrooms_per_room]
        else:
            return np.c_[X, rooms_per_household, population_per_household]

FYI, here's the process I went through to find this bug:

I hope this will help you debug future errors!

Closing this issue, but feel free to reopen it if the problem persists. Please make sure you're using the exact same code as in the book. You can check by looking at the notebooks in this project.

Cheers!

mattbche commented 3 years ago

Thank you for the rapid response!