Open weiranzhao97 opened 11 months ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The analysis report is well-crafted, presenting information in an easy-to-read and understandable manner. The project's conclusions are clearly articulated, contributing to the overall coherence of the report. The Readme is commendable for its clarity and coverage of essential project aspects. The research question is effectively framed, enhancing the overall quality of the project.
The modularization of the code and the use of helper functions demonstrate a thoughtful approach to code organization. The choice of topic is engaging, and the report is enjoyable to read.
In terms of potential improvements, perhaps the below may be of interest:
Readme
:
Although the Creative Commons License is listed in the README, it's not included in the content of the license.
The README lacks explicit information on the Jupyter build command to obtain HTML. Including this detail would facilitate a smoother setup process for users.
Script Files
:
While the script commands are easy to copy and run, incorporating default arguments to expedite the analysis process would be helpful. Adding a help parameter to the click calls in the scripts would provide users with valuable information on each argument's functionality.
Report
:
In the Dataset Description section, consider explaining the types of attributes present in the dataset to enhance understanding of the features used in modelling.
Explicitly state the research question in the report, and consider clarifying the rationale behind choosing a two-year window for predicting business survival. Additionally, explore the possibility of expanding on why logistic regression was selected and whether other models were considered or tested ( noting that this is implicitly implied in the text)
Specify what the significant trends and correlations found during the initial analysis were currently it only mentions that they were found. 2-3 sentences on what those were may be good.
Share insights into the model building process, including whether hyperparameter tuning was performed, consideration of omitting certain features, and the impact of features on prediction quality.
Address the trade-offs made in the research question, particularly whether False Negatives or False Positives are considered more concerning for policymakers.
Mention the metric used for model training, whether accuracy, F1-score, precision, or recall. If F1-score was employed, make this explicit in the report maybe?
Consider adding polynomial features to the model, especially if a linear lens may limit the modelling.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
root
and one in src
, consider remove one of it.This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @beth-ouyang, @arturoboquin, @Prabh95, @weiranzhao97
Repository: https://github.com/UBC-MDS/New_Businesses_Survival_Prediction Report link: https://ubc-mds.github.io/New_Businesses_Survival_Prediction/report_business_survival_prediction.html Abstract/executive summary: Our research focuses on predicting the success of new businesses in Vancouver by analyzing a variety of economic and demographic variables. We rely on data from the City business license registry (City of Vancouver, 2023) and additional sources such as Statistics Canada (2023) to evaluate how factors like location, industry type, and economic conditions influence the longevity of businesses.
Our methodology involves constructing a classification model using logistic regression. This model utilizes the mentioned datasets to determine the probability of a new business sustaining operations over a two-year period. The efficacy of our final model was validated through its performance on a distinct test dataset, achieving an accuracy rate of 0.77. Out of 23,817 test cases, the model accurately predicted the survival of 18,442 businesses.
Editor: @weiranzhao97 Reviewer: @jian3, @charlesxch, @kunya, @salva-u