Open ttimbers opened 2 years ago
Really great job! your report was very easy to follow and concise. There are only a couple minor pointers that I can offer:
Again, great job, I wish you guys all the best on your final project and exams!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Interesting project! Here are some things you guys did well or could improve on:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The makefile does not work properly. Data folder does not exist and the data cannot be generated accordingly.
The makefile does not work well so I can only see your report from "Prediction_of_Real_Estate_Value.ipynb" file. For loading the data, it would be better to use relative path instead of the absolute path since the current path does not work on any other machines except the project author's .
The method of selecting predictors using forward selection is great and robust. Also the visualization can let people easily know the result. Since I have statistics background, it is ok for me to understand those methods such as "forward selection". It would be even better to add some text explanations about this "forward selection" technique as well as the metric - "Mallows' CP" so that people without any relevant knowledge can more easily understand the whole process.
In split data function documentation, it looks that the input can only be a dataframe not a dataset (i.e. csv or excel files), so the description for input "dataset" should be fixed accordingly instead of allowing it to be either a dataset or a dataframe.
Function documentations are not well formatted. Some functions are using "#" while others using "". Also some function documentations are missing parts. But these does not harm a lot.
It looks all functions are following snake case naming convention, so all functions are expected to have lower case letters.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Data analysis review checklist
Reviewer:
Conflict of interest
- [x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.
Code of Conduct
- [x] I confirm that I read and will adhere to the MDS code of conduct.
General checks
- [x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Documentation
- [x] Installation instructions: Is there a clearly stated list of dependencies?
- [x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
- [x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
- [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Code quality
- [x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
- [x] Style guidelines: Does the code adhere to well known language style guides?
- [x] Modularity: Is the code suitably abstracted into scripts and functions?
- [x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robustness?
Reproducibility
- [ ] Data: Is the raw data archived somewhere? Is it accessible?
- [x] Computational methods: Is all the source code required for the data analysis available?
- [x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
- [ ] Automation: Can someone other than the authors easily reproduce the entire data analysis?
Analysis report
- [x] Authors: Does the report include a list of authors with their affiliations?
- [x] What is the question: Do the authors clearly state the research question being asked?
- [x] Importance: Do the authors clearly state the importance for this research question?
- [ ] Background: Do the authors provide sufficient background information so that readers can understand the report?
- [ ] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
- [x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
- [x] Conclusions: Are the conclusions presented by the authors correct?
- [x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
- [x] Writing quality: Is the writing of good quality, concise, engaging?
Estimated hours spent reviewing: 2hrs
Review Comments:
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
1. Overall the project is great and easy for reader to know the analyzing processes. The docker instruction is clear and the docker makefile works well. Also the docker image can be pulled successfully. Files are mostly well-organized.
2. Makefile issues
The makefile does not work properly. Data folder does not exist and the data cannot be generated accordingly.
3. Path name
The makefile does not work well so I can only see your report from "Prediction_of_Real_Estate_Value.ipynb" file. For loading the data, it would be better to use relative path instead of the absolute path since the current path does not work on any other machines except the project author's .
4. Add more explanations to methods
The method of selecting predictors using forward selection is great and robust. Also the visualization can let people easily know the result. Since I have statistics background, it is ok for me to understand those methods such as "forward selection". It would be even better to add some text explanations about this "forward selection" technique as well as the metric - "Mallows' CP" so that people without any relevant knowledge can more easily understand the whole process.
5. Function input clarification
In split data function documentation, it looks that the input can only be a dataframe not a dataset (i.e. csv or excel files), so the description for input "dataset" should be fixed accordingly instead of allowing it to be either a dataset or a dataframe.
6. Function documentations incomplete (very tiny thing to suggest)
Function documentations are not well formatted. Some functions are using "#" while others using "". Also some function documentations are missing parts. But these does not harm a lot.
7. Function name convention (very tiny thing to suggest)
It looks all functions are following snake case naming convention, so all functions are expected to have lower case letters.
Attribution
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Hi, did you download the docker image? Me and three other group members can run the makefile without error on the docker container
Submitting authors: @asmdrk @AaronMKk @ZiyueChloeZhang @mcloses
Repository: https://github.com/DSCI-310/DSCI-310-Group-6
Abstract/executive summary: In this project, we build a regression model that estimates the price per unit of area of houses given the transaction date, the age of the house, the distance to the nearest MRT station, the number of convenience stores, the latitude and the longitude in the Sindian District of New Taipei City. Predictors are chosen through forward selection. Out of the two models we build, we use ANOVA test to pick the model with interaction as the final model. RMSE is used as the evaluation metric for this model.
Editor: @ttimbers
Reviewer: @eahn01 @YuYT98 @luckyberen @mahdiheydar