Open daramireh opened 2 years ago
table3 %>% separate(rate, into = c("cases", "population"))
table3 %>% separate( rate, into = c("cases", "population"), convert = TRUE )
table3 %>% separate(year, into = c("century", "year"), sep = 2)
table5 %>% unite(new, century, year)
table5 %>% unite(new, century, year, sep = "")
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% separate(x, c("one", "two", "three"))
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% separate(x, c("one", "two", "three"))
stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) )
stocks %>% spread(year, return) #implicit to explicit
stocks %>% spread(year, return) %>% gather(year, return, 2015:2016, na.rm = TRUE)
2015
2016
stocks %>% complete(year, qtr)
treatment %>% fill(person)
who1 <- who %>% gather( new_sp_m014:newrel_f65, key = "key", value = "cases", na.rm = TRUE )
who1 %>% count(key)
who2 <- who1 %>% mutate(key = stringr::str_replace(key, "newrel", "new_rel"))
who3 <- who2 %>% separate(key, c("new", "type", "sexage"), sep = "_")
who3 %>% count(new)
who4 <- who3 %>% select(-new, -iso2, -iso3)
who5 <- who4 %>% separate(sexage, c("sex", "age"), sep = 1)
who %>% gather(code, value, new_sp_m014:newrel_f65, na.rm = TRUE) %>% mutate( code = stringr::str_replace(code, "newrel", "new_rel") ) %>% separate(code, c("new", "var", "sexage")) %>% select(-new, -iso2, -iso3) %>% separate(sexage, c("sex", "age"), sep = 1)
Separate() on table3
The rate column contains both cases and population variables
table3 %>% separate(rate, into = c("cases", "population"))
separate() by default convert the value of cols in character.
to separate() as integer use convert = TRUE
table3 %>% separate( rate, into = c("cases", "population"), convert = TRUE )
default convert = F
using sep to select the number of digits that separate with
table3 %>% separate(year, into = c("century", "year"), sep = 2)
unite()
unite() is the opposite of separate()
table5 %>% unite(new, century, year)
using the sep option to unite with underscore
table5 %>% unite(new, century, year, sep = "")
Exercise
1 What do the extra and fill arguments do in separate()?
Experiment with the various options for the following two toy
datasets:
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% separate(x, c("one", "two", "three"))
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% separate(x, c("one", "two", "three"))
the extra and fill arguments are taking like mising values in saparate()
if is a extra argument, separate() eliminated that extra argument
if is a fill argument, separete() will write NA like a mising value
mising values
they are two kind of mising values
Explicitly, i.e., flagged with NA.
Implicitly, i.e., simply not present in the data.
stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) )
stocks %>% spread(year, return) #implicit to explicit
na.rm = TRUE in gather() turn explicit to implicit
stocks %>% spread(year, return) %>% gather(year, return,
2015
:2016
, na.rm = TRUE)complete() turn implict to explicit
complete() takes a set of columns, and finds all unique combinations.
It then ensures the original dataset contains all those values,
filling in explicit NAs where necessary.
stocks %>% complete(year, qtr)
It takes a set of columns where you want missing values
to be replaced by the most recent nonmissing value
treatment %>% fill(person)
case study WHO dataset
gather the variable unknow
who1 <- who %>% gather( new_sp_m014:newrel_f65, key = "key", value = "cases", na.rm = TRUE )
getting some hint of the structure of the values
who1 %>% count(key)
change the col name
who2 <- who1 %>% mutate(key = stringr::str_replace(key, "newrel", "new_rel"))
separating the cases, sex and age range
who3 <- who2 %>% separate(key, c("new", "type", "sexage"), sep = "_")
who3 %>% count(new)
drop cols that are repeat
who4 <- who3 %>% select(-new, -iso2, -iso3)
separate sex and age
who5 <- who4 %>% separate(sexage, c("sex", "age"), sep = 1)
all code in one script
who %>% gather(code, value, new_sp_m014:newrel_f65, na.rm = TRUE) %>% mutate( code = stringr::str_replace(code, "newrel", "new_rel") ) %>% separate(code, c("new", "var", "sexage")) %>% select(-new, -iso2, -iso3) %>% separate(sexage, c("sex", "age"), sep = 1)