UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: Group 2: diabetes classification model #2

Open scout-mckee opened 7 months ago

scout-mckee commented 7 months ago

Submitting authors: @ella-irene @scout-mckee @angelachenmo @s-voon

Repository:https://github.com/UBC-MDS/diabetes_classification_model Report link:https://ubc-mds.github.io/diabetes_classification_model/diabetes_classification_model_report.html Note: The analysis takes a long time to run because of the size of the data set. To run an analysis with a smaller training set, use a larger ratio for the split-ratio in the first terminal command: python scripts/download_split_data.py \ --id=891 \ --write-to=data/raw \ --random=123 \ --split-data-to=data/processed \ --split-ratio=0.35 Abstract/executive summary: In this project, we try to create models for predicting diabetes. We try several different models such as logistic regression, k- nearest neighbours (k-nn), and decision tree. We perform hyper parameter optimization for the decision tree and the knn model. We also use the logistic regression model to explore which features are most important for the classification.

Editor: @ttimbers Reviewer: Weiran Zhao, Ian MacCarthy, Kiersten Gilberg, Rachel Bouwer

rbouwer commented 7 months ago

Data analysis review checklist

Reviewer: @rbouwer

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2.5

Review Comments:

General Checks

Documentation

Code Quality

Tests

Automation

Analysis Report

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Kierst01 commented 7 months ago

Data analysis review checklist

Reviewer: @Kierst01

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

Overall, I think this project is very well done. The clarity of the the README file made it easy to follow along with reproducing the analysis. I also though that the narrative of the report was clear and the necessary components were explained well. There are a couple things I noticed that can be added:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

weiranzhao97 commented 7 months ago

Data analysis review checklist

Reviewer: @weiranzhao97

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hr

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

ianm99 commented 7 months ago

Data analysis review checklist

Reviewer: ianm99

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2h

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.