GokuMohandas / Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.
https://madewithml.com
MIT License
37.1k stars 5.88k forks source link

Removing outliers #183

Closed grofte closed 3 years ago

grofte commented 3 years ago

Hello! Great content =]

But are you sure you want to remove outliers before feature engineering? E.g. if a feature has a power law distribution (as many do) then you would have outliers that are no longer outliers once you take the log of the feature.
Maybe you could add a warning or something. I makes sense to deal with outliers before your feature store but I wouldn't want to remove any outliers before having performed a thorough EDA. Now that I think about it the same goes for dealing with missing values. Of course we are talking MLOps so you might have meant that one should follow this guide once they have a model they are happy with but it seems more all encompassing what you have created.

Just a thought. Feel free to close this issue whenever you want.

GokuMohandas commented 3 years ago

@grofte This is a great point and I've added a note to make this a bit clearer. And you're right, it's definitely not a linear guide and I had quite a bit of trouble writing this section because EDA and transformations are back-and-forth processes. But for the sake of the lesson, I had to place them in separate lessons with caveat notes everywhere.