corinne-riddell / SER-tidyverse-2019

2 stars 0 forks source link

gapminder #17

Open malcolmbarrett opened 5 years ago

malcolmbarrett commented 5 years ago

May need a better approach to gapminder, which will likely be one of the following:

  1. Expanding the intro to gapminder to include a short description of data packages, where built-in data sets live in R
  2. Writing gapminder to .csv and reading it in with readr for practice. Could help have a sense of what it is and where it comes from.

I'm going to check with the RStudio edu team on this to see if there is a best practice

malcolmbarrett commented 5 years ago

Jenny Bryan mentioned that gapminder actually has a built-in .tsv for exactly this purpose: https://github.com/jennybc/gapminder#plain-text-delimited-files

corinne-riddell commented 5 years ago

Maybe the gapminder intro should be part of the intro to R package? I think another confusing thing is after library(gapminder), gapminder data frame becomes available but it isn't in the environment. I would vote for loading .tsv and having them see in environment and click on it to open in the viewer pane so they become familiar with the contents.

malcolmbarrett commented 5 years ago

A few other notes: 1) Hadley said he doesn't see it as too big a deal because you can't avoid their being some mystery when working with beginners and that where the data from data packages is is not that big of one 2) Garrett suggested that we could take the load-from-file approach OR we could walk through what the deal is with built in packages. He suggested showing iris as an example of one that comes with R. Then have people try to load gapminder and see it's not there without library(gapminder). The idea is to try to build a mental model of how data can live in R. He also noted that if we really care to give them an explanation about it in detail (we probably don't), you can actually see it under the environments tab.

I would like the mental-model approach if we had more time, but we're already at capacity. To me that means revisiting the explanation slide and, if it's not feasible to capture the idea succinctly, just use it as an excuse to practice with readr.

A summary the cost-benefit of using library(gapminder) for 2020 us (hello from the past): we gain that we can dive right in to dplyr because they don't actually need to find the file and it exists as expected (no weird loading issues). The cost is that it's a murkier mental model and people may not have installed/loaded the package.

corinne-riddell commented 5 years ago

I'm still in favor of loading from .tsv (or .csv or .xlsx -- something in-line with what is common in epi), given the focus of the workshop on teaching epi folks how to load the sorts of data they'll be working with so they can manipulate and visualize it. Loading from file removes the mystery and avoids time explaining general R concepts that are less useful in the workshop.

malcolmbarrett commented 5 years ago

that's a fair perspective, and I agree that getting this mental model into people's heads is not a priority. I don't know what my opinion is yet, though. Need to think about it.