UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: GROUP_11: SAVE THE EARTH #10

Open Aishwarya120111 opened 7 months ago

Aishwarya120111 commented 7 months ago

Submitting authors: @tonyshumlh, @Jing-19, @Aishwarya120111, @WeilinHan8

Repository: https://github.com/UBC-MDS/DSCI_522_Group-11_Save-The-Earth Report link: https://ubc-mds.github.io/DSCI_522_Group-11_Save-The-Earth/save_the_earth_model.html Abstract/executive summary: Here we attempt to build a prediction model employing the k-nearest neighbors algorithm, designed to leverage energy consumption and energy generation measurements to predict CO2 emissions of a country. Understanding the correlation between consumption of various energy types and CO2 emission is critical for formulating policies aimed at reducing emissions and mitigating climate change impacts [Allen et al., 2018]. Our model’s performance on the unseen test dataset is quite commendable, as reflected by an $R^2$ of 0.97.

However, the model’s effectiveness lies in its ability to identify instances in the training dataset that closely resemble the data it is trying to predict. This means that when it encounters scenarios not represented in its training data, such as substantial shifts in energy usage or the introduction of new types of clean energy, its predictions may not be as accurate. Consequently, to tackle these potential limitations, it is advisable to continue research efforts to further enhance the model’s predictive capabilities.

The data set that was used in this project is from World Bank via GAPMINDER.ORG, which is an independent Swedish foundation with no political, religious or economic affiliations and the link can be found here).

Editor: @ttimbers Reviewer: Sophia Zhao, Allan Lee, Dan Zhang, Bill Wan

wqxxzd commented 7 months ago

Data analysis review checklist

Reviewer: wqxxzd(Dan Zhang)

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 Hour

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

I found the report to be impressively well-written and organized, which made it quite easy for me to understand and follow. There isn't much to improve, but a few minor adjustments could be made.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

zth96 commented 7 months ago

Data analysis review checklist

Reviewer: zth96(Sophia Zhao)

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 Hours

Review Comments:

This is easy for me to follow, understand, and reproduce with your instructions. I find your report to be well-constructed, allowing someone like me, who has never learned about this topic, to understand what you are trying to achieve. Also, I really appreciate that you have included simple instructions for installing your environment in the environment.yaml file. I think this is considerate for those who are new to this.

Overall, you all did a fabulous job! I don't think there is anything major to improve, other than a few minor possible adjustments in the contributing.md that might make your lives easier in the future:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

billwan96 commented 7 months ago

Data analysis review checklist

Reviewer: billwan96

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

In general, the instructions for running the analysis and the flow of the analysis have been very easy to follow. The background of the project is also well explained and is pretty meaningful. There are also detailed explanation on each steps of the pipeline that you are trying to do. Great job!

Some minor suggestions for improvement:

  1. Maybe the docstring of the functions in scripts can be more detailed, in particular maybe include which datatype of the parameters should be provided to the functions.
  2. The message of EDA plots is not too easy to follow. For example, the message of the bar plot of each country might be difficult to understand without looking at the descriptions.
  3. The title of the first plot is not easily understandable, perhaps don't just use the column name as title

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

allan8392 commented 7 months ago

Data analysis review checklist

Reviewer: allan8392 (Allan Lee)

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

1.5 hours

Review Comments:

Overall everything looks good and I do not see any major problems.

  1. It took me a long time to understand your data. I think it would be helpful to provide a table explaining your features and also print the head of your dataframe you feed into your ML pipeline.

  2. Instruction to launch docker is incorrect. You wrote 'docker compose up jupyter-lab' It should be 'docker compose up'

  3. Figure 1 does not have title.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.