hw07 ready for grading - Githubissues

abishekarun commented 6 years ago

xinmiaow commented 6 years ago

Hi @abishekarun,

It’s nice to review your homework 7. Your repo is organized. It’s a pretty good idea to collect all the dataset and figures in a folder. I can find your R script easily. In addition, you explain your workflow clearly in the summary section which make me follow your work well.

00_download_data.R

Download the dataset successfully and name it after gapminder.tsv.

01_exploratory_analysis.R

Read gapminder.tsv by read.table. I did not meet the data cleaning problem. I don’t think it necessary to do data cleaning if you use the function read.delim.
It’s still a good practice to handle character data. I review the functions grep and gsub.
You create five figures including boxplot (summary of lifeExp in each continent), histogram (histogram of lifeExp), density plot (lifeExp), and frequency plot (lifeExp).
You reorder the levels of continents by using the mean of lifeExp in deceasing order, and then save the new dataset by saveRDS so that the reordered information can be saved.

02_statistical_analysis.R

Read the .rds dataset you save in last R script.
Create lin_fit function which can fit the linear model (lifeExo~year), and save the intercept, slope, residual SD, residual variance and R squared.
Drop all observations in Oceania, and then choose five best and five worst countries in each continent by the normalized R square and normalized residual SD. I am kind of confused about your metric. In addition, by the result displayed in your report, you did not choose five counties for each continent. Sometime, there are only two countries are chosen.
Create a function to save ggplots. It’s great and can be used in the future as well.

03_report.Rmd

Nice report. However, I think you also need to put the scatterplot of lifExp over years for each country and wrap them with continent.

Makefile

Clear and organized with a logical flow.
You could try to add some figures in your report by treat the figures as inputs, instead of sourcing your R script.

master.R

You also try to create a R Script to pipelines everything together. Good to learn how to render a .pdf file.

Overall, I think you did a great job. I really enjoy reviewing your homework.

wswade2 commented 6 years ago

Hi @abishekarun,

I found your process description to be meticulous. I could get a very good idea of how to create an automated pipeline just from the description that you provided. I also found using make to be somewhat problematic as a windows user.

Your analysis of the gapminder data set in file 3 is likewise very thorough and furnished with lots of clean-looking plots and tables. I had to look back to a different file to view the code that you used to generate the plot. I suppose this fit with the spirit of the assignment, where the idea is to use code that relies on code from a separate document.

Overall this assignment was done very cleanly and thoroughly. Your homework repo was well-organized and it was easy to find the files I needed to view. Great job!

Wade

abishekarun commented 6 years ago

Hi @xinmiaow, Thank you so much for your review. Thank you for letting me know of the function read_delim(). I would like to explain couple of points.

I tried couple of things to get plots to Rmd file. I saved plots as variables and called them to get the histogram,frequency and density plot in Rmd. For the scatterplot of Oceania, I embedded the saved image file as it is another way to get plots.
Although I have saved the scatter plots of lifeExp vs year facetted by country for each continent, I just showed this plot for Oceania as I felt others are not visually pleasing and easy to interpret results.
I tried to use a common metric for identifying the regression results. Hence I had to normalize the r2 values and standard residual error values for each continent. After that I found that Africa had the worst linear regression fit and America's had the best linear fit. Therefore for chosen threshold values, we don't get desired no of countries for these two continents.

Thank you once again for an elaborate review.

Cheers

xinmiaow commented 6 years ago

Hi @abishekarun ,

Thanks to point out how you call the plots about lifeExp in .Rmd file. It is good to learn that.
Fair enough to understant why you only include one scatterplot in your report.
What I did when choosing the countries is to create a variables called order, and assign the order of for example R squared, and then choose countries based the order. However, for your metric, it seem to be more complex than mine.

Thanks for your response though.

abishekarun / STAT545-hw-rajendran-arun

hw07 ready for grading #7