DSCI-310 / data-analysis-review-2021

0 stars 1 forks source link

Submission: 6: New Taipei City Real Estate Value Prediction #6

Open ttimbers opened 2 years ago

ttimbers commented 2 years ago

Submitting authors: @asmdrk @AaronMKk @ZiyueChloeZhang @mcloses

Repository: https://github.com/DSCI-310/DSCI-310-Group-6

Abstract/executive summary: In this project, we build a regression model that estimates the price per unit of area of houses given the transaction date, the age of the house, the distance to the nearest MRT station, the number of convenience stores, the latitude and the longitude in the Sindian District of New Taipei City. Predictors are chosen through forward selection. Out of the two models we build, we use ANOVA test to pick the model with interaction as the final model. RMSE is used as the evaluation metric for this model.

Editor: @ttimbers

Reviewer: @eahn01 @YuYT98 @luckyberen @mahdiheydar

mahdiheydar commented 2 years ago

Data analysis review checklist

Reviewer: mahdiheydar

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour

Review Comments:

Really great job! your report was very easy to follow and concise. There are only a couple minor pointers that I can offer:

Again, great job, I wish you guys all the best on your final project and exams!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

eahn01 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour

Review Comments:

Interesting project! Here are some things you guys did well or could improve on:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

SunWeihao1226 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 Hour

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

YuYT98 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2hrs

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

1. Overall the project is great and easy for reader to know the analyzing processes. The docker instruction is clear and the docker makefile works well. Also the docker image can be pulled successfully. Files are mostly well-organized.
2. Makefile issues

The makefile does not work properly. Data folder does not exist and the data cannot be generated accordingly.

Screen Shot 2022-04-07 at 12 54 25 PM
3. Path name

The makefile does not work well so I can only see your report from "Prediction_of_Real_Estate_Value.ipynb" file. For loading the data, it would be better to use relative path instead of the absolute path since the current path does not work on any other machines except the project author's .

Screen Shot 2022-04-07 at 12 50 36 PM
4. Add more explanations to methods

The method of selecting predictors using forward selection is great and robust. Also the visualization can let people easily know the result. Since I have statistics background, it is ok for me to understand those methods such as "forward selection". It would be even better to add some text explanations about this "forward selection" technique as well as the metric - "Mallows' CP" so that people without any relevant knowledge can more easily understand the whole process.

5. Function input clarification

In split data function documentation, it looks that the input can only be a dataframe not a dataset (i.e. csv or excel files), so the description for input "dataset" should be fixed accordingly instead of allowing it to be either a dataset or a dataframe.

6. Function documentations incomplete (very tiny thing to suggest)

Function documentations are not well formatted. Some functions are using "#" while others using "". Also some function documentations are missing parts. But these does not harm a lot.

7. Function name convention (very tiny thing to suggest)

It looks all functions are following snake case naming convention, so all functions are expected to have lower case letters.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

AaronMKk commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

  • [x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • [x] Installation instructions: Is there a clearly stated list of dependencies?
  • [x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • [x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • [x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • [x] Style guidelines: Does the code adhere to well known language style guides?
  • [x] Modularity: Is the code suitably abstracted into scripts and functions?
  • [x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robustness?

Reproducibility

  • [ ] Data: Is the raw data archived somewhere? Is it accessible?
  • [x] Computational methods: Is all the source code required for the data analysis available?
  • [x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • [ ] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • [x] Authors: Does the report include a list of authors with their affiliations?
  • [x] What is the question: Do the authors clearly state the research question being asked?
  • [x] Importance: Do the authors clearly state the importance for this research question?
  • [ ] Background: Do the authors provide sufficient background information so that readers can understand the report?
  • [ ] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • [x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • [x] Conclusions: Are the conclusions presented by the authors correct?
  • [x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • [x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 2hrs

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

1. Overall the project is great and easy for reader to know the analyzing processes. The docker instruction is clear and the docker makefile works well. Also the docker image can be pulled successfully. Files are mostly well-organized.
2. Makefile issues

The makefile does not work properly. Data folder does not exist and the data cannot be generated accordingly. Screen Shot 2022-04-07 at 12 54 25 PM

3. Path name

The makefile does not work well so I can only see your report from "Prediction_of_Real_Estate_Value.ipynb" file. For loading the data, it would be better to use relative path instead of the absolute path since the current path does not work on any other machines except the project author's . Screen Shot 2022-04-07 at 12 50 36 PM

4. Add more explanations to methods

The method of selecting predictors using forward selection is great and robust. Also the visualization can let people easily know the result. Since I have statistics background, it is ok for me to understand those methods such as "forward selection". It would be even better to add some text explanations about this "forward selection" technique as well as the metric - "Mallows' CP" so that people without any relevant knowledge can more easily understand the whole process.

5. Function input clarification

In split data function documentation, it looks that the input can only be a dataframe not a dataset (i.e. csv or excel files), so the description for input "dataset" should be fixed accordingly instead of allowing it to be either a dataset or a dataframe.

6. Function documentations incomplete (very tiny thing to suggest)

Function documentations are not well formatted. Some functions are using "#" while others using "". Also some function documentations are missing parts. But these does not harm a lot.

7. Function name convention (very tiny thing to suggest)

It looks all functions are following snake case naming convention, so all functions are expected to have lower case letters.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Hi, did you download the docker image? Me and three other group members can run the makefile without error on the docker container