ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.18k stars 12.92k forks source link

chapter 2- fit and fit_transform #595

Closed digitech-ai closed 4 years ago

digitech-ai commented 4 years ago

First we have created num_pipeline for data preprocessing for numerical columns. upon creating pipeline, fit_transform function is called to fit and transform the training data.

num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),
        ('attribs_adder', CombinedAttributesAdder()),
        ('std_scaler', StandardScaler()),
    ])

housing_num_tr = num_pipeline.fit_transform(housing_num)

similarly full pipleline is build thats includes categorical columns too.

from sklearn.compose import ColumnTransformer

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

full_pipeline = ColumnTransformer([
        ("num", num_pipeline, num_attribs),
        ("cat", OneHotEncoder(), cat_attribs),
    ])

housing_prepared = full_pipeline.fit_transform(housing)

when we build another pipleine with predictor, that includes full pipeline and linear regression model. Only Fit method is called.

full_pipeline_with_predictor = Pipeline([
        ("preparation", full_pipeline),
        ("linear", LinearRegression())
    ])

full_pipeline_with_predictor.fit(housing, housing_labels)
full_pipeline_with_predictor.predict(some_data)

I understand that there is only fit and predict method for estimaters which is linear regression model in this case. but by just calling fit function , how does it understand transform function should get called for "full_pipeline".

Whereas if we just call housing_prepared = full_pipeline.fit(housing) It only fits the data, doesnt actually transform. For it to actually transform the data, we need transform either explicit transform or clubbing both by fit_transform.

Please clarify on this.

rajitkhanna commented 4 years ago

When you call the pipeline's fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call to the next, until it reaches the final estimator, for which it just calls the fit() method.

digitech-ai commented 4 years ago

Thanks for the clarification.