esciencecenter-digital-skills / scikit-learn-mooc

Lesson to teach machine learning in Python with scikit-learn in a 2-day workshop
https://esciencecenter-digital-skills.github.io/scikit-learn-mooc/
Creative Commons Attribution 4.0 International
0 stars 1 forks source link

Episode 2: Data exploration #4

Closed svenvanderburg closed 7 months ago

svenvanderburg commented 9 months ago

0:30 Predictive modeling pipeline: data exploration

Focus on the bare essentials @Flavio Hafner

First, emphasize that a good understanding of the data is important for modeling: make a demo (no livecoding, but show plots either in notebook or with some slides) with the most important plotting approaches (df.hist and sns.pairplot); discuss imbalance in features and in targets (with df[column_name].value_counts())

Second, let the students do the exercise on data exploration with the penguins dataset (follow the material)