Open ttimbers opened 7 months ago
Overall, I believe your project is well done! I'm just nit-picking with my criticisms since there aren't any apparent major problems!
src
folder in the root with an analysis-scripts
and a functions
folder to add clarity to project organization.README.md
, including Docker Desktop!$ docker-compose run --rm final-analysis-env make clean
. It turns out that I forgot to start up Docker Desktop before running the command, which is why I ran into the issue! You may want to include that as a step in the Usage section of your README.md.Figure 1
. It would be nice if the image was bigger or if the graphs used a bigger font!test_cleaning_data.py
, test_eda.py
, test_impute_split.py
, and test_linear_regression.py
. It would be really nice to do the same for each test in test_datareading.py
!create_scatter_plots()
and impute_split()
. They provide valuable information about those functions. It may be extra helpful if some examples were added in the docstrings!clean_data()
, reading_data()
, split_xy_columns()
, and plot_rmse()
would benefit from NumPy style docstrings too!This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
First of all, good job on your analysis!
Some things I would pay attention to.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Great work on your analysis, especially the clarity of explaining all components + motivation of your analysis in the final report! Here are some issues/areas of improvement I was able to identify, and my suggestions for each:
In the PDF rendering of your report, Figure 1 is actually cut off horizontally from the fourth column. Perhaps when you are generating the visualizatin, manually set the number of columns to be 2 or 3 so that the visualization fits the 8.5x11inch standard page size in the PDF rendering. Figure 1 is also missing a title, while the axis labels are descriptive, a title such as “Renewable Electricity Output vs Explanatory Variables of Interest” would be a good idea.
The text in Table 1 does not render correctly in the PDF rendering, and is cut off horizontally like Figure 1. Numbers and letters overlap, making it difficult to read out what is being said. Also, Table 1 takes up 2.5 pages, which I believe is not intended when comparing to the HTML rendering, which has a horizontal scroll feature unlike the PDF. Since the analysis choses to take 2015 as the most recent year (mentioned in Methods and Results, Step 3), it might be a good idea to take out the years occurring after that in the table to help it fit in the PDF rendering.
You can cut down on your repetition in the Dockerfile, what you have right now works perfectly fine but doing the following might help with redundancy and neatness! I've included an example of a side-by-side comparison so you can see the difference.
RUN conda install [insert package]
for each package, you can write:
RUN conda install -- yes \ [insert package] \ [insert package] \ [insert package]
RUN apt-get update [insert package]
several times, you can write:
RUN apt-get update && apt-get install -y [insert package] \ [insert package] \ [insert package]
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: Caden Chan, Neha Menon, Peter Chen & Tak Sripratak
Repository: https://github.com/DSCI-310-2024/DSCI310-Group14/tree/v3.0.0
Abstract/executive summary:
As a complex issue, climate change doesn't have a singular cause, though the impacts of burning fossil fuels is a large source of greenhouse gases, and has caused detrimental effects. Our analysis here attempts to explore if a subset of renewable energy related World Development Indicators along with a simple linear regression model can be used to predict renewable electricity outputs of countries throughout the world. Our analysis created a model with an Root Mean Squared Error (RMSE) score of 23.74. Our model was able to predict most cases accurately though there are some predictions with low accuracy, not close to the actual values. Our model did predict some countries to have a negative renewable electricity output which demonstrates the need for a more complex analysis to be conducted, using advanced machine learning methods. By creating an advanced machine learning model, the capabilities of countries to produce more renewable electricity based on their other World Development Indicators can be calculated and used to influence country specific and global goals and targets.
Editor: @ttimbers
Reviewer: Hanyu Dai, Sana Shams, Daniel Lima, Stephanie Ta