UBC-MDS / data-analysis-review-2022

0 stars 1 forks source link

Submission: GROUP 4: energy_efficiency_analysis #19

Open Suraporn opened 1 year ago

Suraporn commented 1 year ago

Submitting authors: @Suraporn @YHuUBC @MNBhat

Repository: https://github.com/UBC-MDS/energy_efficiency_analysis Report link:https://github.com/UBC-MDS/energy_efficiency_analysis/blob/main/doc/energy_report_rmd.Rmd Abstract/executive summary: Building towers or any building structure nowadays is not difficult if you can afford it, but building it to be the most memorable and efficient is another story. When considering building new towers or skyscraper buildings, it will be great if we know exactly what building parameters relate to their energy efficiency. As a result, we would be able to design not only a magnificent building to remember but also a renowned energy-efficient building.

In this project, we aim to answer questions as,

Editor: @flor14 Reviewer: Ziyi Chen Caroline Tang Shirley Zhang

carolinetang77 commented 1 year ago

Data analysis review checklist

Reviewer: Caroline Tang (@carolinetang77)

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

  1. Trying to design energy-efficient buildings is an interesting topic! It's interesting to think about how these different factors might be affecting the energy usage of a building. However, I don't have a background in engineering/physics, so it would have been helpful to explain what the different features (e.g. 'Glazing Area' and 'Glazing Area Distribution') and targets mean (e.g. What is 'heating load'? Is high heating load good or bad?)
  2. The directory organization looks great! I especially like the models folder with the subfolders for each model used.
  3. The overall analysis looks good, but unfortunately I wasn't able to recreate the analysis in an automated way. Specifically, I wasn't able to replicate the environment using the yaml file, likely due to differences in operating systems. When exporting your conda environment yaml file, try using conda env export --from-history to avoid this issue. The other lines of code in the usage section seem to work fine though.
  4. The scripts are very well documented and easy to read/follow! Great work on that!
  5. There are some grammatical errors in the final report, particularly in the steps of the EDA.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

lennonay commented 1 year ago

Data analysis review checklist

Reviewer: Lennon Au-Yeung @lennonay

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3 hours

Review Comments:

  1. For the Download data section in the report markdown, perhaps it would be better to add backticks between the command so it would be easier for users to copy and paste the line.

  2. For the EDA sections, perhaps the author can add some comments on the data distribution of different features so that readers will be able to follow along the rationale behind performing data transformation such as scaling.

  3. It was nice that the all the used models are saved in different folders and the file organization overall was very clear.

  4. It took me some time to understand figure 5, perhaps it would be nicer to sort the data by their heating load value or it might be better to show the absolute error for the observations.

  5. For the markdown file of the report, it would be better to make sure that all bullet points start with a capital letter so that they are standardized.

Overall, I congratulate the authors on successfully building a model from start to finish. Well done!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

zchen156 commented 1 year ago

Data analysis review checklist

Reviewer: Ziyi Chen (zchen156)

Conflict of interest

shlrley commented 1 year ago

Data analysis review checklist

Reviewer: Shirley Zhang @shlrley

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

This is such a fascinating topic and dataset, and I'm really impressed by what you guys have done to explore your research question! Especially in going outside of the scope of our MDS courses and using the "XG Boost" model, I've never heard of it before but it looks super interesting.

  1. The 'Usage' section of the README.md is very clear and easy to follow. However, perhaps you could include a line indicating how to specifically navigate to the cloned repository. For example, instead of "Navigate to your local repository", you could write: cd energy_efficiency_analysis This makes it much more clear what 'local repository' refers to, and ensures the user will start in the right directory.

2) All of the scripts are overall very well commented and easy to follow. However, I noticed that there is some repetition and redundancy of the script descriptions. For example, first there are comments (with #) which describe what the script does, then more descriptions inside """. Perhaps the first few comments could be deleted to be more concise.

3) I liked how modularized each script is, and I think all of the names are very concise and easy to understand. There is however one script that I think could be made more clear. The name of download.py is a bit confusing as there are already data_preprocess.py and download_data.py. If my interpretation is correct, it looks like this script converts an excel file to a csv file. Perhaps the script could be renamed to reflect this?

Furthermore, the description for the purpose of this script (inside of the """) does not seem to match up with what it is doing.

4) Inside of the eda_script_plots_update.py and model_predict.py scripts, it may be better to separate various operations inside of the main function into individual functions outside of the main function (increasing modularization). For example, creating a separate function for creating the plots.

Furthermore, it would be best not to define a function inside of the main function (i.e the save_chart function from Joel Ostblom is defined inside of the main function but could be defined outside).

5) The report is super interesting and I loved that you documented very clearly the steps to follow to recreate the analyses! The plots were also very nice additions. I would love to see a bit more discussion and interpretation of the figures you created in the EDA stage, and how that fits into answering your research question (i.e why did you choose to include these specific plots?).

6) It's great that you guys included a variety of different models. Although the XGBoost model performs very well, perhaps it might still be interesting to look at some hyperparameter optimization for some of your models.

7) The table headings are correctly placed above each table, but I believe you should move the figure titles under each figure.

8) There are a lot of limitations listed, which shows that you guys have thought a lot about your analysis and where your model would not generalize well in.

9) I would love to see more justification on why certain methodology was chosen, for example giving more context into the XGBoost model and how you found it/it's relevance. Same with why you perhaps did not use hyperparameter optimization.

Overall, congratulations on your project so far, I'm excited to see the final product!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

YHuUBC commented 1 year ago

Dear reviewers,

We appreciate your constructive feedback. Upon your feedback, we made the following changes.

  1. We created test on scripts : https://github.com/UBC-MDS/energy_efficiency_analysis/commit/504119d20e558f64d59f44e663a493081d93a5f2,.
  2. We moved sub-functions out from the main function in scripts: https://github.com/UBC-MDS/energy_efficiency_analysis/commit/4583620c17e492d41e06117fa3157b7fac95788c
  3. We revised the CONTRIBUTING file: https://github.com/UBC-MDS/energy_efficiency_analysis/commit/3a557d87a0aca1bd1da4f13b48becb1091a6abc3
  4. We recreated a reproducible environment :https://github.com/UBC-MDS/energy_efficiency_analysis/commit/2ec5a79068d6a8c7dd2c3c67e15817aa25a506b3
  5. We add figure and table captions: https://github.com/UBC-MDS/energy_efficiency_analysis/commit/9331f4d5aa876f867bf265ce09525de1310267a8.
  6. We broke long-script into shorter functions: https://github.com/UBC-MDS/energy_efficiency_analysis/commit/4583620c17e492d41e06117fa3157b7fac95788c

Thank you and we greatly appreciate your feedback.