Consider moving matpltolib intro plotting to the beginning of day 2 and talk
more about seaborn and matplotlib styling options. It make sense to have
a short intro to matplotlib based on their questions some of Maddy’s comments.
It’s also fun to see how easy it is to change colors and styles for the plots and make plots your own by choosing a style.
Start with single quantitittave distribution plots (histograms), then scatter, than categorical vs quantitative? Might flow better if general matplotlib is moved to this day.
Hopefully revise for seaborn 0.9... will facilitate some explanations and could reduce the need for the facetgrid section (or at least shorten it).
Great to have a short review in the beginning of the second day, include this in the material.
Include general pointers on where to look next. Which are the things they
could look into understanding when they are done with this? Could include
links to documentation or tutorials elsewhere. We could also include short
sections on some of these if we want to.
Time series (pandas)
Interactive plotting (altair, plotly)
Basic statistics (statsmodel + our lesson soon?)
Basic machine learning (scikitlearn + our lesson)
Basic image analysis (scikitimage + our lesson)
Data cleaning
This is important in general and probably for having it uploaded to data carpentry. We need something that replaces the open refine part, I will look intro their lessons and what they teach there. A bit of an oversight on my end to not include more of this already.
Include with NA session
use .unique (and maybe histograms) to find problems
Splitting columns based on separator
can also add columns first to show that syntax
Replace character with str.replace
I don’t think we will have time to cover regex or fuzzy matching, but
it can be mentioned that it exists and where to read more.
str.upper/lower, str.contains, str.strip
na values when reading in?
dropping and changing data types of columns
Misc
Make sure that we have a narrative throughout as if we are doing EDA, not just
showing the next thing.
More links to documentation and SO questions throughout.
Show them how to find help on SO in addition to looking up in the documentation.