Open abishekarun opened 6 years ago
Hi @abishekarun,
It’s nice to review your homework 7. Your repo is organized. It’s a pretty good idea to collect all the dataset and figures in a folder. I can find your R script easily. In addition, you explain your workflow clearly in the summary section which make me follow your work well.
00_download_data.R
gapminder.tsv
.01_exploratory_analysis.R
Read gapminder.tsv
by read.table
. I did not meet the data cleaning problem. I don’t think it necessary to do data cleaning if you use the function read.delim
.
It’s still a good practice to handle character data. I review the functions grep
and gsub
.
You create five figures including boxplot (summary of lifeExp in each continent), histogram (histogram of lifeExp), density plot (lifeExp), and frequency plot (lifeExp).
You reorder the levels of continents by using the mean of lifeExp in deceasing order, and then save the new dataset by saveRDS
so that the reordered information can be saved.
02_statistical_analysis.R
Read the .rds dataset you save in last R script.
Create lin_fit function which can fit the linear model (lifeExo~year), and save the intercept, slope, residual SD, residual variance and R squared.
Drop all observations in Oceania, and then choose five best and five worst countries in each continent by the normalized R square and normalized residual SD. I am kind of confused about your metric. In addition, by the result displayed in your report, you did not choose five counties for each continent. Sometime, there are only two countries are chosen.
Create a function to save ggplots. It’s great and can be used in the future as well.
03_report.Rmd
Makefile
Clear and organized with a logical flow.
You could try to add some figures in your report by treat the figures as inputs, instead of sourcing your R script.
master.R
Overall, I think you did a great job. I really enjoy reviewing your homework.
Hi @abishekarun,
I found your process description to be meticulous. I could get a very good idea of how to create an automated pipeline just from the description that you provided. I also found using make to be somewhat problematic as a windows user.
Your analysis of the gapminder data set in file 3 is likewise very thorough and furnished with lots of clean-looking plots and tables. I had to look back to a different file to view the code that you used to generate the plot. I suppose this fit with the spirit of the assignment, where the idea is to use code that relies on code from a separate document.
Overall this assignment was done very cleanly and thoroughly. Your homework repo was well-organized and it was easy to find the files I needed to view. Great job!
Wade
Hi @xinmiaow, Thank you so much for your review. Thank you for letting me know of the function read_delim(). I would like to explain couple of points.
Thank you once again for an elaborate review.
Cheers
Hi @abishekarun ,
Thanks to point out how you call the plots about lifeExp in .Rmd file. It is good to learn that.
Fair enough to understant why you only include one scatterplot in your report.
What I did when choosing the countries is to create a variables called order, and assign the order of for example R squared, and then choose countries based the order. However, for your metric, it seem to be more complex than mine.
Thanks for your response though.
Readme file
hw07 folder
Rendered Html Report