Open daramireh opened 2 years ago
library(tidyverse) vignette("tibble") library(tibble)
as_tibble(iris)
tibble( x = 1:5, y = 1, z = x^2 + y )
tribble( ~x,~y,~z, "a", 2, 3.6, "b", 1, 8.5 )
tibble( a = lubridate::now() + runif(1e3) 86400, b = lubridate::today() + runif(1e3) 30, c = 1:1e3, d = runif(1e3), e = sample(letters, 1e3, replace = TRUE) )
nycflights13::flights %>% print(n = 10, width = Inf)
df <- data.frame(abc = 1, xyz = "a") df$x df[, "xyz"] df[, c("abc", "xyz")]
tb <- tibble( abc = 1, xyz = "a")
tb[1] tb$abc tb[, "xyz"]
annoying <- tibble( 1 = 1:10, 2 = 1 * 2 + rnorm(length(1)) )
1
2
annoying["1"]
ggplot(data = annoying, aes(x = 1, y = 2))+ geom_dotplot(binwidth = .5)
annoying <- annoying %>% mutate("3" = annoying$2/annoying$1)
rename(annoying, "one" = "1", "two" = "2", "three" = "3")
table1
table1 %>% mutate(rate = cases / population * 10000)
table1 %>% count(year, wt = cases)
ggplot(table1, aes(year, cases)) + geom_line(aes(group = country), color = "grey50") + geom_point(aes(color = country))
table4a %>% gather("1999", "2000", key = "year", value = "cases")
table4b %>% gather(1999, 2000, key = "year", value = "population")
1999
2000
tidy4a <- table4a %>% gather(1999, 2000, key = "year", value = "cases") tidy4b <- table4b %>% gather(1999, 2000, key = "year", value = "population") left_join(tidy4a, tidy4b)
spread(table2, key = type, value = count)
stocks <- tibble( year = c(2015, 2015, 2016, 2016), half = c( 1, 2, 1, 2), return = c(1.88, 0.59, 0.92, 0.17)
)
stocks
stocks %>% spread(year, return) %>% gather("year", "return", 2015:2016)
2015
2016
table4a %>% gather(1999, 2000, key = "year", value = "cases")
people <- tribble( ~name, ~key, ~value,
"Phillip Woods", "age", 45, "Phillip Woods", "height", 186, "Phillip Woods", "age", 50, "Jessica Cordero", "age", 37, "Jessica Cordero", "height", 156 )
people %>% filter(value < 50 | value > 50) %>% spread(people, key = key, value = value)
people2 <- tribble( ~name, ~key, ~value,
"Phillip Woods", "age", 45, "Phillip Woods", "height", 186, "Jessica Cordero", "age", 37, "Jessica Cordero", "height", 156 )
spread(people2, key = key, value = value)
library(tidyverse) vignette("tibble") library(tibble)
Using tibble as a data frame.
as_tibble(iris)
creating a new tibble from strings
tibble( x = 1:5, y = 1, z = x^2 + y )
using tribble to created a new tibble
tribble( ~x,~y,~z, "a", 2, 3.6, "b", 1, 8.5 )
Difference between data frame and tibble
printing: tibble only print the firts 10 rows
tibble( a = lubridate::now() + runif(1e3) 86400, b = lubridate::today() + runif(1e3) 30, c = 1:1e3, d = runif(1e3), e = sample(letters, 1e3, replace = TRUE) )
controling the number of row when print a data frame
nycflights13::flights %>% print(n = 10, width = Inf)
Exercises
2 Compare and contrast the following operations on a
data.frame and equivalent tibble. What is different? Why
might the default data frame behaviors cause you frustration?
Data frame
df <- data.frame(abc = 1, xyz = "a") df$x df[, "xyz"] df[, c("abc", "xyz")]
tibble
tb <- tibble( abc = 1, xyz = "a")
tb[1] tb$abc tb[, "xyz"]
4. Practice referring to nonsyntactic names in the following data
frame by:
annoying <- tibble(
1
= 1:10,2
=1
* 2 + rnorm(length(1
)) )a. Extracting the variable called 1.
annoying["1"]
b. Plotting a scatterplot of 1 versus 2.
ggplot(data = annoying, aes(x = 1, y = 2))+ geom_dotplot(binwidth = .5)
c. Creating a new column called 3, which is 2 divided by 1.
annoying <- annoying %>% mutate("3" = annoying$
2
/annoying$1
)d. Renaming the columns to one, two, and three:
rename(annoying, "one" = "1", "two" = "2", "three" = "3")
5 what tibble::enframe() do
converting vector to data frame
Chapter 9 tidy data
table1 is a example of tidy data
table1
Tranformation data with dplyr
Compute rate per 10,000
table1 %>% mutate(rate = cases / population * 10000)
Compute cases per year
table1 %>% count(year, wt = cases)
Ploting
ggplot(table1, aes(year, cases)) + geom_line(aes(group = country), color = "grey50") + geom_point(aes(color = country))
Gathering data
table4a have a problem with two cols. Solved it with gather()
The set of columns that represent values, not variables. In this
example, those are the columns 1999 and 2000
The name of the variable whose values form the column names
I call that the key, and here it is year
The name of the variable whose values are spread over the cells
I call that value, and here it’s the number of cases
table4a %>% gather("1999", "2000", key = "year", value = "cases")
We can use gather() to tidy table4b in a similar fashion
table4b %>% gather(
1999
,2000
, key = "year", value = "population")combine tidy table4a and 4b in one tibble
tidy4a <- table4a %>% gather(
1999
,2000
, key = "year", value = "cases") tidy4b <- table4b %>% gather(1999
,2000
, key = "year", value = "population") left_join(tidy4a, tidy4b)Spreading: it´s de opposite of gathering
You use it when an observation is scattered across multiple rows
To use spread() first analyze the representation of data set
identify the key on cols and value on rows or observation
on table2, the key is type and value is count
spread(table2, key = type, value = count)
As you might have guessed from the common key and value arguments,
spread() and gather() are complements. gather() makes
wide tables narrower and longer; spread() makes long tables
shorter and wider.
Exercises
1 Why are gather() and spread() not perfectly symmetrical?
Carefully consider the following example:
stocks <- tibble( year = c(2015, 2015, 2016, 2016), half = c( 1, 2, 1, 2), return = c(1.88, 0.59, 0.92, 0.17)
)
stocks
stocks %>% spread(year, return) %>% gather("year", "return",
2015
:2016
)Both spread() and gather() have a convert argument. What does it do?
both tibble arent symmetric because spread() organizated the data begining by
half, and gather() begining by year.
2. Why does this code fail?
table4a %>% gather(1999, 2000, key = "year", value = "cases")
the error its on the key and value, because "year" and "cases" doesnt exist like observation
3. Why does spreading this tibble fail? How could you add a new
column to fix the problem?
people <- tribble( ~name, ~key, ~value,
-----------------|--------|------
"Phillip Woods", "age", 45, "Phillip Woods", "height", 186, "Phillip Woods", "age", 50, "Jessica Cordero", "age", 37, "Jessica Cordero", "height", 156 )
people %>% filter(value < 50 | value > 50) %>% spread(people, key = key, value = value)
people2 <- tribble( ~name, ~key, ~value,
-----------------|--------|------
"Phillip Woods", "age", 45, "Phillip Woods", "height", 186, "Jessica Cordero", "age", 37, "Jessica Cordero", "height", 156 )
spread(people2, key = key, value = value)