We want to normalize our data, as some of our covariates are on very different scales. When making a pipeline for the machine learning part of our assignment, we're discussing on whether to use pipeline.fit, or pipeline.fit_transform
Module 12 is not very clear or consistent about this. In the first 'Model pipelines'-video, .fit_transform is called after specifying a StandardScaler() in the pipeline. However for all remaining examples in the module we simply call pipeline.fit - Is the data still being transformed/scaled since the StandardScaler() is still specified in the pipeline? Or is the scaling step just there, while not being used?
It depends on whether you have your supervised learning model in the pipeline or not.
If you do not have it in the pipe, then you need to use fit and transform on the training data, since you still need to train the supervised model afterwards.
If you have it in the pipe then you only need to use fit.
Hi!
We want to normalize our data, as some of our covariates are on very different scales. When making a pipeline for the machine learning part of our assignment, we're discussing on whether to use pipeline.fit, or pipeline.fit_transform
Module 12 is not very clear or consistent about this. In the first 'Model pipelines'-video, .fit_transform is called after specifying a StandardScaler() in the pipeline. However for all remaining examples in the module we simply call pipeline.fit - Is the data still being transformed/scaled since the StandardScaler() is still specified in the pipeline? Or is the scaling step just there, while not being used?