UBC-MDS / data-analysis-review-2021

1 stars 4 forks source link

Submission: Group 16: Beijing Air Quality Analysis #3

Open MacyChan opened 2 years ago

MacyChan commented 2 years ago

Submitting authors: Jacqueline Chong, Junrong Zhu, Macy Chan, Vadim Taskaev

Repository:https://github.com/UBC-MDS/DSCI_522_Beijing_Air_Quality Report link:https://ubc-mds.github.io/DSCI_522_Beijing_Air_Quality/ Abstract/executive summary: This analysis project aims to answer whether the levels of PM2.5 air pollution in Beijing, China has improved between 2013 and 2017. To do so, we performed a difference in medians hypothesis test between two intervals, time_A (March 2013 - February 2015) and time_B (March 2015 - February 2017), and concluded that no statistically significant decrease in PM2.5 particulate measurements can be detected.

Editor: @flor14 Reviewer: Michelle Wang, Siqi Tao, Hu Jiwei and Wang Shi Yan

michelle-wms commented 2 years ago

Reviewer: Michelle Wang @michelle-wms

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour

Review Comments:

This is a really interesting topic and the report was overall well written and analyzed to give some great (albeit slightly disappointing, one would've thought their quality will be better by now) conclusions about Beijing's air pollution improvement over time. Some detailed comments can be found below.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Jacq4nn commented 2 years ago

Hi @michelle-wms. Thank you for your comments.

Thanks again Michelle for your prompt response!

Cheers!

sy25wang commented 2 years ago

Data analysis review checklist

Reviewer: Shi Yan Wang @sy25wang

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: One and a half hours

Review Comments:

Overall, I think the project is well structured with clearly stated research question. The README.md provides a good background and clear instructions on the project. All files are put in a reasonable place, and they are identifiable. I see that the team also numbers files sequentially in the doc folder, which is a great practice to follow. The final report is easy to follow, supported with visualizations.

Some potential improvements may be as follows:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

hjw0703 commented 2 years ago

Data analysis review checklist

Reviewer: hjw0703

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour

Review Comments:

Overall, I think it is a nice report with a good structure and convincing content. The only things that I can think of that might improve the quality are the following:

  1. The question here is: "Does PM2.5 measurement in Beijing, China collected between 2013 and 2017 show any sign of improvement?", why do you divide the interval into two groups with 2 years interval for each group rather divide it into four groups with 1 year interval for each group? Since the mean of PM2.5 dropped in 2016 but increased in 2017 and on average it is the same as in 2013-2015 as shown in Table 4 in the EDA part. I think you could probably just compare year 2016 and 2017.
  2. As it is shown the data is skewed in distribution, I suggest using logarithmic scale when doing the hypothesis test because t-statistic might not working well in severe skewed data.
  3. Instead of mentioning the Environmental Keznets Curve Hypothesis economic model in the conclusion, I think you could say something about how you could improve the inference in the future, like what if you could have more recent data on Beijing's air quality.
  4. PM2.5 is not the only thing that matters to the air quality, maybe if you have time, you could look into other features in the data set like PM10.
  5. Maybe you could illustrate how you get the p-value in the results and discussion part, is it from student t-statistic or from bootstrap sampling distribution and give the exact values of these statistics, like is the p-value equals 1 or 0.99.
  6. If you choose median as the test statistic, you could include median in the EDA part as opposed to mean.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

SiqiTao commented 2 years ago

Data analysis review checklist

Reviewer: SiqiTao

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour

Review Comments:

This is a very meaningful topic! The content is well structured and codes are organized. Some of my thoughts and suggestions over this project are listed below:

  1. In the methodology section of the final report, the x-label of boxplot in Figure 2 could be better specified. Also, there is a grey block to the right of the figure that I think should be removed.
  2. When talking about air quality, it would be more convincing if we also consider other pollutants other than PM2.5, as PM2.5 not the only factor that affects the air quality.
  3. Since the air quality fluctuates across different months of a year and this can be an unseen periodical variation of the data, I would suggest looking into a certain range of time of each year (for example, October to February) instead of all 12 months.
  4. The length of time interval between two time frames may influence the result as well (the two frames used in the project are consecutive, which looks a bit too close to me), may be you can try time frames with a bigger interval(eg: 2013-2014 and 2016-2017), if sample size permits, and see if it makes a difference.
  5. The reason why median is chosen could be better explained (to my understanding, median was chosen over mean because the data is heavily skewed), and it should also be included in the EDA.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

vtaskaev1 commented 2 years ago

Thank you michelle-wms, sy25wang, hjw0703, and SiqiTao for your peer review feedback.

We have made changes in relation to the following four comments:

  1. Based on michelle-wms's last comment, we made changes to the Results and Discussion section of the Final Report to ensure the scope of the implications of our findings remains in line with out project analysis objective. More specifically, we re-stated the implications of our analysis as relating China's economic development to its environmental policies, rather than directly relating it to the Kuznets Curve, which we mention briefly throughout our project. (commit: 35b4e95)
  2. Based on SiqiTao's first comment, we made changes to the formatting of Figure 2 in the Methodology section of the Final Report. (commit: 1628a64)
  3. Based on sy25wang's first comment, we adjusted the formatting of Figure 2 of the EDA document to replace the x-axis labels numeric labels with their month string label equivalent, resulting in improved readability. (commit: c5e3bfb)
  4. Finally, based on hjw0703's fifth comment, we modified rounding of the p-value result as stated in the Results and Discussion section of the Final Report. This is a subtle change, where we increased the rounding of the p-value to 7 decimal places, but a meaningful one for improved interpretability, as the previous rounding to fewer decimals (to a value of 1.0) raised unnecessary suspicion. (commit: 0bdf7a0)