abishekarun / STAT545-hw-rajendran-arun

0 stars 0 forks source link

hw04 ready for grading #4

Closed abishekarun closed 6 years ago

abishekarun commented 6 years ago

Final markdown file

hw04 folder

Tangjiahui26 commented 6 years ago

Peer Review:

Hi, @abishekarun ! You did a really perfect homework and went above and beyond the requirements of the assignment. In fact, you only need to pick one of the data reshaping prompt and a join prompt, but you have chosen all, which is worthy of praise.

  1. As for activity1 of the data reshaping part, you made a cheatsheet with rich content. Gather() and spread() was applied to realize three different examples. Besides, you also explored new tidyr functions to show some similar features.

    For instance, took numbers in category and create new column for them using regex parameter in extract(); added rows to make more explicit for id and score values in complete(), and I think this function can be a wrapper around left_join(); expand(), unnest() and so on.

    A small suggestion for this part is that you can try to give more explanation of the coding, this might help to read your code and clarify doubts.

  2. You made a tibble with one row per year and columns for lifeExp for China, India and Myanmar in activity2 of data reshaping. In this part, you created a function and called it after. I believe it is an effective way to simplify codes and maybe I can realize this in next homework. In addition, grid_arrange() was used to put multiple graphs on the same page. Actually if you want, you can try multiplot in ggplot2 by calling ggplot2.multiplot(Plot1,Plot2,Plot3,Plot4, cols=2)

  3. I did not choose activity3-5 in first part, but I learned a lot by reading your codes. You did really well in activity5. However, I have some doubts when I read your activity3 and activity4.

Activity3 In your code, I am not very clear about the function of complete(nesting(continent,year)). It seems that you can get the same result without using it, as shown below.

a <- gapminder %>%
          group_by(continent,year) %>% 
          summarize(mean_lifeExp=mean(lifeExp)) %>% 
          spread(continent,mean_lifeExp)

Activity4 In this section, you considered all the continent, but there is a problem in your results: after 1982, the min and max values were wrong. So mutate(variable=ifelse(!duplicated(year),"min","max")) should not work, and you may handle these in different way.

  1. As for join part, you have done very well in each activity. If you want, you can make it better by doing complete proofreading, because some parts were not displayed well in your .md file.

  2. Uniform format is a good practice. It is very useful for me to know that you can adjust tables by creating a function and using kable_style().

Overall, I think your homework was really well done and hope you can keep it up! Regards, Jiahui Tang

abishekarun commented 6 years ago

Hi @Tangjiahui26 Thank you so much for such an elaborate peer review. I would definitely look into and use the multiplot function of ggplot from next time. I also would give more comments from my next homework as well.

In third activity, I did the complete nesting part in activity three since it gives every combination and is also useful in handling the missing values as it has a fill parameter. I dont think it is required in this activity as you have mentioned although it would be helpful in other cases where we might encounter missing values.

And for the fourth activity, I have checked once again and found that the values are coming the same with the normal gapminder data filtered over years. If possible can you confirm it once again since I am getting the same values for years after 1987 as well. I have used the mutate function to include a column to denote the max and min values for each row. And then reshaped the data to yield the final table. Let me know the mistake. Thank you once again.

Tangjiahui26 commented 6 years ago

@abishekarun Thank you for your explanation. But Why the max value at 1982 is Sierra Leone(38.445), while the min value is Japan(77.11)? Maybe I made a mistake, but the result seems not quite right.

abishekarun commented 6 years ago

@Tangjiahui26 Thank you for pointing that out. I just checked that part. Sorry for not noticing it earlier. I have corrected it now. Kindly check and let me know if you find any other issue. Thanks once again :)

farihakhan commented 6 years ago

Great job on creating a cheatsheet for activity 1. It's really thorough and goes over a lot of material with good examples. I think it was smart to include it in a separate rmd file! Really cool how you created a function to format your kables, I hadn't thought of that, it definitely seems more efficient than formatting each kable individually. I also like how you used regex here to extract data. Maybe including a table of common regex would add more value to your cheatsheet here. One thing I would recommend is giving your variables a more descriptive name, I find that it is easier to keep track of them this way.

I was really impressed that you did all of the activities suggested for this assignment, they all look good. All of the data is summarized nicely in tables and graphs throughout the assignment, in addition to nice embedded links to your other rmd files. Honestly, I don't have too much advise for you for this assignment, I'm really impressed and think it's done excellently. If there is one thing I would change about this assignment, I would change the the column name "release_year" in your tables to remove the special characters.

Overall I think you did a really great job for this assignment!!

ksedivyhaley commented 6 years ago

Reshaping activity: Yes (several) Join activity: Yes (several) Reflect on process: Yes Bonus (merge/match): Yes

Comments:

Your mark will be distributed later. If you would like more feedback, please feel free to message me on slack.

ksedivyhaley commented 6 years ago

Also, great conversation @Tangjiahui26 and @abishekarun!