STAT547-UBC-2019-20 / group_07

0 stars 3 forks source link

Assignment 3 Peer Review #7

Closed jacobgerlofs closed 4 years ago

jacobgerlofs commented 4 years ago

Assignment 3 Peer Review.

In order to organize and guide this review, I've taken each review question from the assignment guideline and turned it into section header, and have responded to the question below. Enjoy!

Code Quality, Structure, and Reproducibility,

Is the repository organized in a useful and sensible way?

Yes! the repo is well organized, matching the requirements for the Milestone projects, which is intuitive. You have a data, docs, scripts, images, etc. folder and a README and all files are located in the right place.

Is there a Usage section in the README?

Yes! there is a USAGE section that allows me to follow the necessary steps to run all their scripts. I think this section could be improved by using code formatting that .md files allow. This would make it easier to identify, copy, and paste code from the README into Rstudio. The syntax for is three backticks to open the code section, and then three more backticks to close the code section. The result would look like this:

print("Hello world")

This is a small detail but I think it makes navigating and usage a lot easier!

Do the scripts run to completion as described in the README?

Yes! in following all the instructions, all the scripts ran smoothly for me without any errors.

Do the scripts produce the expected output?

Yes! Data was successfully imported, cleaned, and I got all the intended exploratory analysis outputs (plots, etc.)

Are the scripts and functions appropriately documented?

Yes! every section of your code has comments which are concise and provide context for what each function is doing. Great work, lots of people neglect this. Also, I noticed your group does a great job of implementing 'defensive programming'; you include functions/statements that check if previous steps have been executed correctly in order to proceed, paired with appropriate and useful error messages if something doesn't work. This is great! I have to adopt this more as a good practice in my milestone project.

Does it take a long time for the analysis to run? Are there tasks that could be vectorized to make things faster?

No! Analysis ran quickly without problems. I didn't find any obvious places were tasks could be vectorized. Vectorization can be tough, I'm not good at it!

Data Visualization and Research Questions

Are the research questions appropriate and well-chosen?

I think a little more context about the data file you are using would be helpful. I was a little confused after reading your description and felt the need to check out the link you provided to get some more context. Subsequently, I was little unsure of what your research goals were with this data file. These could all be outlined in your README. I didn't write this section for my group's Milestone project (my more diligent partner did), and I think they did a really good job introducing our data and our goals. If you'd like, you can check out our repo here.

Are the visualizations effectively displayed? How can they be improved?

Plots are good, but I also think they could be improved. For instance, in your 'readability' and 'sentiment' plots, there is an extremely high density of data points in one are. To mitigate this, you could specify an alpha value when you call geom_point, which will change the translucency of the data points. This allows you to better see overlap in data points. You can also use the 'jitter' argument to stagger your data points a little, which helps with data crowding. I would suggest trying to add these arguments in, which would look something like this...

geom_point(position = "jitter", alpha = .12)

Are there other, more effective visualizations that could or should be used? Does the

You have covered some interesting things with your analysis. Most of your analysis is focused at examining change across time. But there doesn't seem to be meaningful trends in this case, which could be largely driven but the massive shift in available data after the year 2000. I would be interested in seeing changes perhaps just over the past 20 years. Also, the correlagram you produce is awesome, and can also guide research questions. There are some strong correlations between variables that you don't explore, and I would be interested in seeing these correlations.

Summary

Great work! as far as code quality goes (which for this course is really the more important bit) your milestone project is great. I didn't run into any issues, it is fully functional and reproducable.

There are some small details that could be improved, which aren't crucial in the context to course objectives. This includes research objectives, data visualization, and README readability.

lucymosquera commented 4 years ago

Thanks @jacobgerlofs for the detailed feedback, we really appreciate it!