Tangjiahui26 / DataAnalysisWithR

Homework For STAT545
http://stat545.com
0 stars 0 forks source link

hw03 ready for grading #5

Open Tangjiahui26 opened 7 years ago

Tangjiahui26 commented 7 years ago

@vincenzocoia @gvdr @ksedivyhaley @JoeyBernhardt @mynamedaike @pgonzaleze @derekcho

SHA: commit 43900164c7fffe38c234031220f63a4c8c51446c Homework03 folder hw03-Tang-Jiahui.md hw03-Tang-Jiahui-I want to do more.Rmd

mylinhthibodeau commented 7 years ago

Dear @Tangjiahui26,

You did an amazing job with the homework. I really like the fact that you detailed your process very well and it is easy to follow your train of thought.

I think you deserve full marks because it shows that you put a lot of time and effort in your homework, but since you did mention some remaining questions you had about the best approach to solve certain problems, I thought I could provide you some of the tips I found.

Task 2

Here, I think you did a good job, but using a histogram may not be the best way to plot summary statistics. Moreover, in order for the bars to be coloured, you need to write fill = colour in aes. For example, if I am doing a plot of how many gdpPercap data entries there is per continent, I would write:

gapminder %>% 
  group_by(continent, gdpPercap) %>%
  ggplot(aes(x = continent)) +
  geom_bar(aes(stat = "identity", fill=continent))

For plots of summary statistics, I would recommend using a point for the mean and a line representing the mininum-maximum span.

gapminder %>%
  group_by(continent) %>%
  ggplot(aes(x=continent, y=gdpPercap, colour = continent)) +
  stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max,
  colour = "red")

Or a boxplot can also represent summary statistics quite well.

gapminder %>%
  group_by(continent) %>%
  ggplot(aes(x=continent, y=log10(gdpPercap))) +
  geom_boxplot(aes(colour = continent))

Or you can do a mean weighted as a function of the population this way.

gapminder %>%
  group_by(year, continent) %>%
  summarize(gdpPercap_wtmean = weighted.mean(gdpPercap, pop)) %>%
  ggplot(aes(x=year, y= gdpPercap_wtmean)) +
  geom_point(aes(colour = continent))

In the last part of your homework, it is unclear what you are trying to plot.

If looks like you tried to plot the max, min, mean and median of your Gdp_z scores, but density is not the right plot for this type of data. If you would like to plot your Gdp_z scores according to the LifeExp_z and country, you could do:

T5 <- gapminder %>% 
  filter(year > 1980, continent == "Asia") %>% 
  group_by(country, year) %>% 
  summarise(mean_LifeExp = mean(lifeExp), 
            mean_gdpPercap = mean(gdpPercap),
            mean_gdp = mean(gdpPercap * pop)) %>% 
  mutate(LifeExp_z = ((mean_LifeExp - mean(mean_LifeExp)) / sd(mean_LifeExp)),
         GdpPercap_z = ((mean_gdpPercap - mean(mean_gdpPercap)) / sd(mean_gdpPercap)),
         Gdp_z = ((mean_gdp - mean(mean_gdp)) / sd(mean_gdp))) %>%
  arrange(country) %>%
  select(country, year, Gdp_z, LifeExp_z)

T5 %>% ggplot(aes(x=Gdp_z, y=LifeExp_z, colour=country)) +
  geom_path(aes(alpha = 0.3)) +
  theme(text = element_text(size=12))

Also, great job on trying something new in the "I want to do more" section. So overall, you did a great job and I think you are learning a lot and taking full advantage of the opportunities in this class.

I hope you will find my comments helpful, I did put quite a bit of thought and time in them since I actually used a different dataset for my homework and I had to re-think some of the problems in the Gapminder frame.

I wish you good luck in your and keep on the good work ! Warm regards, My Linh

arthursunbao commented 7 years ago

Hi Jianhui,

Excellent post for your homework, well done!

You use a lot of extra commands which are not mentioned in class such as facetting() in Question1 and it is also a very nice plot for Question1. I like that.

For Question2, summarise() is used and using histogram plots a very nice plot with 5 subplots in it, which is fantastic. You use the theme() to make the plot arrange in a very nice way.

Question3, you use three ways of mean to calculate the average mean, which is fantasic

Question4, I like your plot of facet_wrap(~ continent, scales = "free_y") and geom_smooth(lwd = 1, se = FALSE, span = 5), which shows your research skills in finding proper function to make the plots beautiful. So is plot 4.

Like mylinhthibodeau has said. I have nothing else to add on.

Nice work again!

Regards Jason

pgonzaleze commented 7 years ago

Hi @Tangjiahui26, here some comments about your homework: At least three tasks: Yes
A table and figure for each task: Yes
Comments on tables/figures: Yes Reflections on Process: Yes Bonus (side-by-side layout, new table packages): Yes

Pedro G.