Open c-martinez opened 8 years ago
I suggest we continue to use the same dataset as we’ve been using and change the intro to say how we haven’t worried about cleaning the data yet, instead of suggesting a new ‘real’ dataset – this one is already real. Then perhaps work through fixing the fields that have multiple values with ‘|’ and maybe some other cleaning if we can see any – for instance making the published date into an actual datetime field.
The ecology episode focuses a lot on graphs of the data – maybe we want to instead look at some ways of looking at the text in the titles and or author names?
I haven't had time to start on this yet, happy for someone else to try.
Which data set do we want to use for this lesson?