Closed stephpenn1 closed 5 years ago
The range of years in gapminder
is 1952-2007. To find this, use the code:
print(range(gapminder$year))
gapminder %>%
print(n = 20) %>%
summary()
This statement can be recreated in base R using the following line:
summary(print(gapminder, n=20))
To find the average life expectancy, we must first filter to only the last year of the data set, 2007. We can then group by continent, and then use the summarise function to find the average life expectancy using the base R function weighted.mean:
weighted_life_exp <- gapminder %>%
filter(year==2007) %>%
group_by(continent) %>%
summarise(avg_life_Exp=weighted.mean(lifeExp,pop))
We first filter the data set to only include China. We then use select()
to extract only the year
and lifeExp
columns. This is then put in to ggplot()
to create a plot of life expectancy over time.
china_life_exp <- gapminder %>%
filter(country=='China') %>%
select(lifeExp, year)
ch_life_exp_time <- ggplot(china_life_exp, aes(year,lifeExp)) +
geom_point() +
labs(x='Year', y='Life Expecancy (yrs)', title='Life Expectancy in China') +
theme_bw()
Summarise to find the maximum population of each country:
max_pop <- gapminder %>%
group_by(country) %>%
summarise(maxpop=max(pop))
The plot is filtered to the year 1967, with gdpPercap
on the x axis and lifeExp
on the y axis. The x-axis is on a logarithmic scale. The points are colored by continent and sized by population.
plot_recreate <- gapminder %>%
filter(year==1967) %>%
ggplot(aes(gdpPercap,lifeExp,color=continent,size=pop)) +
geom_point() +
scale_x_log10() +
labs(x='GDP per capita', y='life expectancy', title='Year 1967', subtitle='Gapminder Dataset') +
theme_bw()
@stephpenn1 is the grader on this one but 👏 @hmoore28 . One comment though, the extra pipeline is
Write a pipeline that computes the year of max population for each country.
Question 1
print(range(gapminder$year))
2007-1952 55 years
Question 2
print(summary(gapminder, n = 20))
Question 3
gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
summarise(avg_life_Exp = weighted.mean(lifeExp, pop)) -> avg_life_exp
print(avg_life_exp)
# A tibble: 5 x 2
continent avg_life_Exp
<fct> <dbl>
1 Africa 54.6
2 Americas 75.4
3 Asia 69.4
4 Europe 77.9
5 Oceania 81.1
Question 4
gapminder %>%
filter(country == "China") %>%
select(year, lifeExp) -> china_life
print(china_life)
Plotting Challenge
The range of years in this dataset is 1952-2007.
print(range(gapminder$year))
The base R version of the 20 row gapminder
summary is:
print(summary(gapminder, n = 20))
The average life expectancy for each continent weighted by country population in the last year is determined by filtering the data by the last year in the dataset, 2007, grouping by continent, then by using the weighted.mean
function.
life_expectancy <- gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
summarise(weighted_life_exp = weighted.mean(lifeExp, pop))
Pipeline that picks out China and returns only the year and lifeExp columns:
china_life_exp_over_time <- gapminder %>%
filter(country == "China") %>%
select(year, lifeExp)
Plot of life expectancy over time:
plot_china_life_exp <- china_life_exp_over_time %>%
ggplot(aes(year, lifeExp)) +
geom_point() +
labs(title = "China's Life Expectancy over Time", y = "Life Expectancy")
print(plot_china_life_exp)
help!
gdp_vs_life_exp <- gapminder %>%
filter(year == 1967) %>%
ggplot(aes(gdpPercap, lifeExp, color = continent, size = pop)) +
geom_point() +
theme_bw() +
scale_x_log10() +
labs(title = "Year 1967", subtitle = "Gapminder Dataset", x = "GDP per capita", y = "life expectancy")
print(gdp_vs_life_exp)
Please post your answers in one comment below. And do take advantage of the formatting tools available when writing comments (https://help.github.com/en/articles/basic-writing-and-formatting-syntax) for readability. Have fun and slack us on the #suli-rstats channel if you need help!
Packages Needed:
dplyr
tidyr
ggplot2
gapminder
- data package of life expectancy, GDP per capita, and population for 142 countriesQuestion 1:
What is the range (hint, hint) of the years in this dataset?
Question 2:
How would you do this in base R (without pipelines)? Think about how functions are structured (function(argument))
Question 3:
Write a pipeline that prints the average life expectancy for each continent in the last year of the dataset. Note that to do this correctly, you’ll need to weight by country populations. Paste the tibble/dataframe, also note you can use ``` on either side of text to format code in a comment.
Question 4:
Write a pipeline that picks out China and returns only the
year
andlifeExp
columns and plot the life expectancy over time. Check out theselect()
function in dplyr.Extra Credit Pipeline:
Write a pipeline that computes the year of max population for each country.
Plotting Challenge:
Using the full
gapminder
dataset, reproduce this plot. Note: the x-axis looks like it's been scaled.Hint: First look at what data is being shown. Has it been filtered? What variables are plotted and how?