Open maurolepore opened 4 years ago
GitHub doesn't support attaching a R file (?!), so here it is in formatted code...
# original Tidy Data treatise: https://vita.had.co.nz/papers/tidy-data.pdf
# current Tidy Data discussion: https://r4ds.had.co.nz/tidy-data.html
library(dplyr)
library(tidyr)
billboard
# gather(data, key = "key", value = "value", ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE)
billboard %>% gather(key, value, wk1, wk2, wk3)
billboard %>% gather('key', 'value', wk1, wk2, wk3)
billboard %>% gather('key', 'value', wk1, wk2, wk3) %>% select(artist, track, date.entered, key, value)
billboard %>% gather('key', 'value', 4:ncol(billboard))
billboard %>% gather('key', 'value', -artist, -track, -date.entered)
billboard %>% gather('key', 'value', -(1:3))
billboard %>% gather('key', 'value', wk1:wk76)
billboard %>% gather('key', 'value', starts_with('wk'))
billboard %>% gather('key', 'value', matches('wk[0-9]*$'))
billboard %>% gather('week', 'rank', matches('wk[0-9]*$'))
billboard %>% gather(key = 'week', value = 'rank', matches('wk[0-9]*$'))
# pivot_longer(
# data,
# cols,
# names_to = "name",
# names_prefix = NULL,
# names_sep = NULL,
# names_pattern = NULL,
# names_ptypes = list(),
# names_repair = "check_unique",
# values_to = "value",
# values_drop_na = FALSE,
# values_ptypes = list()
# )
billboard %>% pivot_longer(wk1, wk2, wk3)
billboard %>% pivot_longer(wk1:wk76)
billboard %>% pivot_longer(4:ncol(billboard))
billboard %>% pivot_longer(-(1:3))
billboard %>% pivot_longer(matches('wk[0-9]*$'))
billboard %>% pivot_longer(wk1:wk76, names_to = 'week', values_to = 'rank')
billboard %>% pivot_longer(wk1:wk76,
names_to = 'week', values_to = 'rank',
names_prefix = 'wk')
billboard %>% pivot_longer(wk1:wk76,
names_to = 'week', values_to = 'rank',
names_prefix = 'wk',
names_ptypes = list(week = integer()))
billboard_long <-
billboard %>%
pivot_longer(wk1:wk76, names_to = 'week', values_to = 'rank',
names_prefix = 'wk', names_ptypes = list(week = integer()))
# spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL)
billboard_long
billboard_long %>% spread(week, rank)
billboard_long %>% spread(week, rank, sep = '_')
billboard_long %>% spread(week, rank, sep = '')
billboard_long %>% rename(wk = week) %>% spread(wk, rank, sep = '')
# pivot_wider(
# data,
# id_cols = NULL,
# names_from = name,
# names_prefix = "",
# names_sep = "_",
# names_repair = "check_unique",
# values_from = value,
# values_fill = NULL,
# values_fn = NULL
# )
billboard_long %>% pivot_wider(names_from = week, values_from = rank)
billboard_long %>% pivot_wider(names_from = week, values_from = rank, names_prefix = 'wk')
Who is the audience?
@vintented and @Clare2D proposed this topic, which should be useful to anyone who uses or is interested in using tidy data for their analyses. This includes not only folks at 2DII but also beyond.
Why is this important?
What should be covered? Suggested speakers or contributors
The speaker will be @cjyetman, who will run a live example and discuss it. Many folks at 2DII have experience reshaping data (e.g. @vintented, @Clare2D, and @jdhoffa) and may be able to contribute with questions, comments, and recomendations.
Some example questions to inspire the discussion:
gather()
is superseded bypivot_longer()
andspread()
bypivot_wider()
?Resources