Open jsukup opened 5 years ago
Hi @jsukup , thanks for your question.
Are you referring to this code example?
>>> some_data = housing.iloc[:5]
>>> some_labels = housing_labels.iloc[:5]
>>> some_data_prepared = full_pipeline.transform(some_data)
>>> print("Predictions:", lin_reg.predict(some_data_prepared))
Predictions: [ 210644.6045 317768.8069 210956.4333 59218.9888 189747.5584]
>>> print("Labels:", list(some_labels))
Labels: [286600.0, 340600.0, 196900.0, 46300.0, 254500.0]
If so, then notice that it does prepare the data (full_pipeline.transform(some_data)
) before it uses the trained model to make predictions (lin_reg.predict(some_data_prepared)
).
Hope this helps, Aurélien
@ageron Hi!
Testing in my own laptop, some_data_prepared
(after full_pipeline.transform(some_data)
) only contains three different categories, which doesn't match the linear model.
Hi @huang-jl ,
I can see only two explanations:
1) Perhaps your full_pipeline
was trained on a part of the dataset that only contained three different categories. Instead, the model should be trained on the full training set (as in the book and the notebook), like in this cell:
from sklearn.compose import ColumnTransformer
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
])
housing_prepared = full_pipeline.fit_transform(housing)
2) Perhaps you are calling full_pipeline.fit_transform(some_data)
instead of full_pipeline.transform(some_data)
? If so, then just replace fit_transform()
with transform()
: we're only supposed to fit the training set.
Hope this helps.
I also ran into same problem some_data_prepared
only has 3 categories instead of 5 when I first execute the predict(some_data_prepared)
full_pipeline.named_transformers_['cat'].categories_
lists only 3 categories.
However, after I ran the cell mentioned above again, the issue was resolved without any code change and OneHotEncoder now learns that there are 5 categories and the predict
works.
This is super weird though...maybe an internal bug from sklearn
I'm also having this same problem just before tthis code
hi, on page 75 of the second version of the book, i am having a problem with loading the dataset, after writing the code for downloading it
It appears that the data used to test the trained linear regression model on page 75 of the 2nd edition of "Hands-on..." is using the unprocessed
housing
data frame. If the model was trained withhousing_prepared
shouldn't the examples (i.e.some_data=housing.iloc[:5]
) use the processed data set as well (i.e. some_data=housing_prepared[:5])?