Open scout-mckee opened 12 months ago
General Checks
.DS_Store
or .Rhistory
in your repo (could add these to your .gitignore file). I believe I only saw the .Rhistory
file in the root, however I saw the .DS_Store
in the root as well as the following folders: data/, report/, results/ (results/models/, results/tables/), src/, and tests/..ipynb
files from old Milestones in your src/ and doc/ folder. I’m not sure the exact protocol for archived files but maybe you could add some documentation to state that these are old versions of your current analysis (or group them all in an archived
folder). Documentation
Code Quality
Tests
if __name__ == “main”: pytest main()
in your test-get-feature-importance.py (and not the other tests). You could add this to all tests for consistencyAutomation
Analysis Report
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Overall, I think this project is very well done. The clarity of the the README file made it easy to follow along with reproducing the analysis. I also though that the narrative of the report was clear and the necessary components were explained well. There are a couple things I noticed that can be added:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @ella-irene @scout-mckee @angelachenmo @s-voon
Repository:https://github.com/UBC-MDS/diabetes_classification_model Report link:https://ubc-mds.github.io/diabetes_classification_model/diabetes_classification_model_report.html Note: The analysis takes a long time to run because of the size of the data set. To run an analysis with a smaller training set, use a larger ratio for the split-ratio in the first terminal command: python scripts/download_split_data.py \ --id=891 \ --write-to=data/raw \ --random=123 \ --split-data-to=data/processed \ --split-ratio=0.35 Abstract/executive summary: In this project, we try to create models for predicting diabetes. We try several different models such as logistic regression, k- nearest neighbours (k-nn), and decision tree. We perform hyper parameter optimization for the decision tree and the knn model. We also use the logistic regression model to explore which features are most important for the classification.
Editor: @ttimbers Reviewer: Weiran Zhao, Ian MacCarthy, Kiersten Gilberg, Rachel Bouwer