abishekarun / STAT545-hw-rajendran-arun

0 stars 0 forks source link

hw07 ready for grading #7

Open abishekarun opened 6 years ago

abishekarun commented 6 years ago

Readme file

hw07 folder

Rendered Html Report

xinmiaow commented 6 years ago

Hi @abishekarun,

It’s nice to review your homework 7. Your repo is organized. It’s a pretty good idea to collect all the dataset and figures in a folder. I can find your R script easily. In addition, you explain your workflow clearly in the summary section which make me follow your work well.

00_download_data.R

01_exploratory_analysis.R

02_statistical_analysis.R

03_report.Rmd

Makefile

master.R

Overall, I think you did a great job. I really enjoy reviewing your homework.

wswade2 commented 6 years ago

Hi @abishekarun,

I found your process description to be meticulous. I could get a very good idea of how to create an automated pipeline just from the description that you provided. I also found using make to be somewhat problematic as a windows user.

Your analysis of the gapminder data set in file 3 is likewise very thorough and furnished with lots of clean-looking plots and tables. I had to look back to a different file to view the code that you used to generate the plot. I suppose this fit with the spirit of the assignment, where the idea is to use code that relies on code from a separate document.

Overall this assignment was done very cleanly and thoroughly. Your homework repo was well-organized and it was easy to find the files I needed to view. Great job!

Wade

abishekarun commented 6 years ago

Hi @xinmiaow, Thank you so much for your review. Thank you for letting me know of the function read_delim(). I would like to explain couple of points.

  1. I tried couple of things to get plots to Rmd file. I saved plots as variables and called them to get the histogram,frequency and density plot in Rmd. For the scatterplot of Oceania, I embedded the saved image file as it is another way to get plots.
  2. Although I have saved the scatter plots of lifeExp vs year facetted by country for each continent, I just showed this plot for Oceania as I felt others are not visually pleasing and easy to interpret results.
  3. I tried to use a common metric for identifying the regression results. Hence I had to normalize the r2 values and standard residual error values for each continent. After that I found that Africa had the worst linear regression fit and America's had the best linear fit. Therefore for chosen threshold values, we don't get desired no of countries for these two continents.

Thank you once again for an elaborate review.

Cheers

xinmiaow commented 6 years ago

Hi @abishekarun ,

  1. Thanks to point out how you call the plots about lifeExp in .Rmd file. It is good to learn that.

  2. Fair enough to understant why you only include one scatterplot in your report.

  3. What I did when choosing the countries is to create a variables called order, and assign the order of for example R squared, and then choose countries based the order. However, for your metric, it seem to be more complex than mine.

Thanks for your response though.