Open rdstern opened 7 years ago
The second package added is the Lahman data on baseball. Of course very American. I am not so interested in this except it is (apparently) also available as an Access database. I wonder how we can import database data and set up the keys and links at the same time. This is our challenge with CLIMSOFT already. One interesting overview data frame is called LahmanData. This provides details of all the other 24 data frames in this package.
I have returned to this topic as I write the help for R-Instat. I need a set of example data sets. We have our own and I have now been systematically through the R for Data Science book and the idea of mixing data sets used in the book with our own data sets has a good ring to it. They are easily available within R-Instat. So here is my list, with a few questions at the end. It is also partly a reference for myself.,
First those from the R for Data Science book:
That seems to be it. Now "our" data sets.
The Malawi workshop will also follow - to some extent the R for Data Science book by Hadley Wickham. We have our own datasets, but I also want to use this opportunity to look into the data used in this book. It would be good to be able to (at least) repeat the analyses he does in the first 2 parts of the book - that are on descriptive statistics and data wrangling.
And we could do worse than use these sets also among our own regular testing data.
So his initial chapter just uses 2 sets from the ggplot2 package, namely diamonds and mpg.
He also uses data from a package (which is now included) called nycflights13. This is a package of just 5 datasets maintained by Hadley Wickham. These 5 datasets include one on weather, which could be interesting to us in its own right. It is hourly data for one year for 3 locations. (Because problems with flights are often related to weather.) It is organised in exactly the format we need for our climatic analyses. We don't yet have anything special for within-day data - but we will. The other 4 datasets are all more obviously linked - so we might want to make them into a single (Instat Object) RDS file and store them additionally as that!
My suggestion above that we save the dataset as an Instat object would make this demonstration even simpler!