Submission: GROUP_11: SAVE THE EARTH

Submitting authors: @tonyshumlh, @Jing-19, @Aishwarya120111, @WeilinHan8

Repository: https://github.com/UBC-MDS/DSCI_522_Group-11_Save-The-Earth Report link: https://ubc-mds.github.io/DSCI_522_Group-11_Save-The-Earth/save_the_earth_model.html Abstract/executive summary: Here we attempt to build a prediction model employing the k-nearest neighbors algorithm, designed to leverage energy consumption and energy generation measurements to predict CO2 emissions of a country. Understanding the correlation between consumption of various energy types and CO2 emission is critical for formulating policies aimed at reducing emissions and mitigating climate change impacts [Allen et al., 2018]. Our model’s performance on the unseen test dataset is quite commendable, as reflected by an $R^2$ of 0.97.

However, the model’s effectiveness lies in its ability to identify instances in the training dataset that closely resemble the data it is trying to predict. This means that when it encounters scenarios not represented in its training data, such as substantial shifts in energy usage or the introduction of new types of clean energy, its predictions may not be as accurate. Consequently, to tackle these potential limitations, it is advisable to continue research efforts to further enhance the model’s predictive capabilities.

The data set that was used in this project is from World Bank via GAPMINDER.ORG, which is an independent Swedish foundation with no political, religious or economic affiliations and the link can be found here).

Editor: @ttimbers Reviewer: Sophia Zhao, Allan Lee, Dan Zhang, Bill Wan

[x] I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be accepted.

Data analysis review checklist

Reviewer: wqxxzd(Dan Zhang)

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1 Hour

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

I found the report to be impressively well-written and organized, which made it quite easy for me to understand and follow. There isn't much to improve, but a few minor adjustments could be made.

GitHub Repo:
- The naming of the data folder is a bit different from usual. It might be good to make this consistent with other naming conventions.
- The hyperlink labeled 'Install' redirects to docker.com, which could potentially cause a bit of confusion.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Data analysis review checklist

Reviewer: zth96(Sophia Zhao)

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 Hours

Review Comments:

This is easy for me to follow, understand, and reproduce with your instructions. I find your report to be well-constructed, allowing someone like me, who has never learned about this topic, to understand what you are trying to achieve. Also, I really appreciate that you have included simple instructions for installing your environment in the environment.yaml file. I think this is considerate for those who are new to this.

Overall, you all did a fabulous job! I don't think there is anything major to improve, other than a few minor possible adjustments in the contributing.md that might make your lives easier in the future:

Probably, ask contributors to follow your specific coding standards or style guide to maintain consistency with your work.
Maybe encourage contributors to write tests for their new code or bug fixes so that you can identify issues right away.
If possible, consider providing detailed steps on how to raise an issue with sufficient information and context so that you don't waste time trying to understand short and poorly written issues.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Data analysis review checklist

Reviewer: billwan96

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x ] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x ] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x ] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[ x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x ] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[ x] Style guidelides: Does the code adhere to well known language style guides?
[ x] Modularity: Is the code suitably abstracted into scripts and functions?
[ x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x ] Data: Is the raw data archived somewhere? Is it accessible?
[ x] Computational methods: Is all the source code required for the data analysis available?
[x ] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x ] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x ] Authors: Does the report include a list of authors with their affiliations?
[x ] What is the question: Do the authors clearly state the research question being asked?
[x ] Importance: Do the authors clearly state the importance for this research question?
[x ] Background: Do the authors provide sufficient background information so that readers can understand the report?
[ x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[ x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[ x] Conclusions: Are the conclusions presented by the authors correct?
[ x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[ x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 hours

Review Comments:

In general, the instructions for running the analysis and the flow of the analysis have been very easy to follow. The background of the project is also well explained and is pretty meaningful. There are also detailed explanation on each steps of the pipeline that you are trying to do. Great job!

Some minor suggestions for improvement:

Maybe the docstring of the functions in scripts can be more detailed, in particular maybe include which datatype of the parameters should be provided to the functions.
The message of EDA plots is not too easy to follow. For example, the message of the bar plot of each country might be difficult to understand without looking at the descriptions.
The title of the first plot is not easily understandable, perhaps don't just use the column name as title

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Data analysis review checklist

Reviewer: allan8392 (Allan Lee)

Conflict of interest

[x ] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x ] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x ] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x ] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x ] Installation instructions: Is there a clearly stated list of dependencies?
[x ] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x ] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x ] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x ] Style guidelides: Does the code adhere to well known language style guides?
[x ] Modularity: Is the code suitably abstracted into scripts and functions?
[x ] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x ] Data: Is the raw data archived somewhere? Is it accessible?
[x ] Computational methods: Is all the source code required for the data analysis available?
[x ] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x ] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x ] Authors: Does the report include a list of authors with their affiliations?
[x ] What is the question: Do the authors clearly state the research question being asked?
[x ] Importance: Do the authors clearly state the importance for this research question?
[x ] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x ] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x ] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x ] Conclusions: Are the conclusions presented by the authors correct?
[x ] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x ] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing:

1.5 hours

Review Comments:

Overall everything looks good and I do not see any major problems.

It took me a long time to understand your data. I think it would be helpful to provide a table explaining your features and also print the head of your dataframe you feed into your ML pipeline.
Instruction to launch docker is incorrect. You wrote 'docker compose up jupyter-lab' It should be 'docker compose up'
Figure 1 does not have title.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

UBC-MDS / data-analysis-review-2023