ds4owd-001 / project-adnanwijayarso

This report attempts to analyze if there's any correlation between drinking water access level and GDP per capita in the ASEAN region compared to other SDG regions. This project report was prepared for the data science for openwashdata course.
https://ds4owd-001.github.io/project-adnanwijayarso/
0 stars 0 forks source link

Project Feedback #5

Open mianzg opened 6 months ago

mianzg commented 6 months ago

@adnanwijayarso Congrats again on accomplishing your capstone project! We are happy that you went through the course with us and applied the skills in the project.

I am reviewing your project on "Drinking Water Service Level and Gross Domestic Product (GDP) per capita in ASEAN Countries: A Comparative Study". Note that my review does not have any assessment on the writing content of the report, but I did enjoy reading it and learnt new stuff.

Please find the detailed feedback in the next section.

mianzg commented 6 months ago

This is a very nice report that completes beyond all the required items in the Required Items. Awesome!

The figures and tables are clear and informative and the report is very clear. I would like to give some detailed feedbacks since your skills look like intermediate.

Technical

  1. Put "Loading the packages" and "Importing datasets" into separate code chunks.
  2. You brought up a really interesting case of renaming a lot of columns with some patterns. I am thinking in this case it would make more sense to use colnames() with a vector of names to set them. Then it's about to create these new column names, I researched a bit, and this is my approach
    prefix <- c("rural_", "urban_", "national_")
    types <- c("basic", "limited", "unimproved", "surfacewater")
    combinations <- dplyr::arrange(expand.grid(prefix = prefix, types = types), prefix) # this will create all possible combinations of the two vectors
    paste0(combinations$prefix, combinations$types)
  3. Consider use "urban_percent" or "urban_perc". The name "% urban% is not a good machine-friendly idea.
  4. You applied many relocate functions. I understand that you would like to have the column positions in a certain way, but is it really needed? I rarely saw doing so many operations if it doesn't affect the analysis. Or consider to do it at the very end.
  5. If you like, you may start to read on how to write R functions to reduce the repeated code: https://style.tidyverse.org/functions.html
mianzg commented 6 months ago

@adnanwijayarso We would like to extend this project either into an openwashdata blog or a data package if you are interested! The latter usually requires an original dataset but your selection and combination of the existing data really showcases an interesting and unique perspective. That means, we will develop an R data package that you will be an author. You and others can directly use a tidy version of the data in R later. Please let us know if you are interested in a discussion! @margauxgo @larnsce

adnanwijayarso commented 4 months ago

Hello @mianzg, thank you for the detailed feedback! Sorry, it's been a while since you posted it.

  1. I will revise the code based on points 1 and 3 on your feedback.
  2. On point 2, I didn't know I could rename several columns at once using vector, that sounds neat! I'll try to work your approach into my code.
  3. On point 4, I guess it's just an old habit of mine since I worked a lot in Excel tables to display data in reports - positioning data columns a certain way helps my colleagues interpret it.

Also, I'm excited for this project to be developed further, thank you for the offer! Let's set up a discussion, let me know what I need to do on my end!