abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

pipeline .fit or .fit_transform #53

Open jacobwiberg opened 4 years ago

jacobwiberg commented 4 years ago

Hi!

We want to normalize our data, as some of our covariates are on very different scales. When making a pipeline for the machine learning part of our assignment, we're discussing on whether to use pipeline.fit, or pipeline.fit_transform

Module 12 is not very clear or consistent about this. In the first 'Model pipelines'-video, .fit_transform is called after specifying a StandardScaler() in the pipeline. However for all remaining examples in the module we simply call pipeline.fit - Is the data still being transformed/scaled since the StandardScaler() is still specified in the pipeline? Or is the scaling step just there, while not being used?

abjer commented 4 years ago

It depends on whether you have your supervised learning model in the pipeline or not.