First, emphasize that a good understanding of the data is important for modeling: make a demo (no livecoding, but show plots either in notebook or with some slides) with the most important plotting approaches (df.hist and sns.pairplot); discuss imbalance in features and in targets (with df[column_name].value_counts())
Second, let the students do the exercise on data exploration with the penguins dataset (follow the material)
0:30 Predictive modeling pipeline: data exploration
Focus on the bare essentials @Flavio Hafner
First, emphasize that a good understanding of the data is important for modeling: make a demo (no livecoding, but show plots either in notebook or with some slides) with the most important plotting approaches (
df.hist
andsns.pairplot
); discuss imbalance in features and in targets (withdf[column_name].value_counts()
)Second, let the students do the exercise on data exploration with the penguins dataset (follow the material)