Closed matt-dray closed 5 years ago
Simplified subsets of the data are now available for workforce data and for FTE data (both are stored in gists).
The workforce file contains only a handful of the columns in the original file and they're stored with short snake_case names. The FTE file is for demonstrating joins and can be matched using the urn
column.
You can also get these from http://bit.ly/swfc_workforce and http://bit.ly/swfc_fte.
Would be good to have this dataset used as the one for the whole training book and linked to chapter 2 to load
Though beware: when you read in these data you need to specify the na
argument, as in:
swfc <- read_csv(
"swfc_206_headcount.csv",
na = c("", " ", "NA", "SUPP", "DNS")
)
This is a good way to introduce the idea of arguments (if it hasn't already been mentioned); the concept of NA
in R; and the detection of classes from column content. (See line 89 here.)
I'm closing this: we've got nicer datasets now (linked above).
Problem:
str()
anddplyr::glimpse()
are demonstrated in the book as ways to quickly view the structure of the dataset, but they're unreadable in the console with the incredibly long names in the dataset.Example:
Average..Mean..Number.of.Days.Lost.to.Teacher.Sickness.Absence..only.Teachers.in.School.Taking.Sickness.Absence
.Solution: retain only the columns of relevance in the dataset and rename longer names to something shorter so that
str()
anddplyr::glimpse()
can be demonstrated appropriately; this will require a reupload of the data as in issue #11.