Variable names are too long

dfe-analytical-services / r-training-course

Introductory training course for R, using the published School Workforce Census

https://dfe-analytical-services.github.io/r-training-course/

4 stars 1 forks source link

Variable names are too long #12

Closed matt-dray closed 5 years ago

matt-dray commented 6 years ago

Problem: str() and dplyr::glimpse() are demonstrated in the book as ways to quickly view the structure of the dataset, but they're unreadable in the console with the incredibly long names in the dataset.

Example: Average..Mean..Number.of.Days.Lost.to.Teacher.Sickness.Absence..only.Teachers.in.School.Taking.Sickness.Absence.

Solution: retain only the columns of relevance in the dataset and rename longer names to something shorter so that str() and dplyr::glimpse() can be demonstrated appropriately; this will require a reupload of the data as in issue #11.

matt-dray commented 6 years ago

Simplified subsets of the data are now available for workforce data and for FTE data (both are stored in gists).

The workforce file contains only a handful of the columns in the original file and they're stored with short snake_case names. The FTE file is for demonstrating joins and can be matched using the urn column.

You can also get these from http://bit.ly/swfc_workforce and http://bit.ly/swfc_fte.

TomFranklin commented 6 years ago

Would be good to have this dataset used as the one for the whole training book and linked to chapter 2 to load

matt-dray commented 6 years ago

Though beware: when you read in these data you need to specify the na argument, as in:

swfc <- read_csv(
  "swfc_206_headcount.csv",
  na = c("", " ", "NA", "SUPP", "DNS")
)

This is a good way to introduce the idea of arguments (if it hasn't already been mentioned); the concept of NA in R; and the detection of classes from column content. (See line 89 here.)

matt-dray commented 5 years ago

I'm closing this: we've got nicer datasets now (linked above).