UBC-MDS / group29

Project Repo for Group 29 for DSCI 522
MIT License
0 stars 9 forks source link

Update READme #36

Closed rachelywong closed 3 years ago

rachelywong commented 3 years ago

@wiwang

sukh2929 commented 3 years ago

Updated "Usage", "selecting the best model" and"Results" section of Readme

Usage To replicate this analysis, clone this repository, install the dependencies below, and run the following code in your terminal: python src/download.py --local_path=./data/raw --url=https://archive.ics.uci.edu/ml/machine-learning-databases/00296/dataset_diabetes.zip python src/processingdata.py --input=“data/raw/dataset_diabetes/diabetic_data.csv” --output=“data/processed” python src/eda2.py --input=“data/processed/diabetes_with_race.csv” --output=“reports/figures” Python src/script4 --input=“data/processed/diabetes_clean.csv” —output=

Selecting the Best Model There is a mix of categorical, numerical, and binary features for which we will apply proper transformations for use in analysis. There is no class imbalance based on the target readmitted column. We have used RBF SVM and LR models and picked them based on the best scores given by the model. We have split the data into 80% training and 20% testing and re-fit the best model on the entire training data set. Next, we have evaluated its performance on the test data set. The output can be found here : link

Results The written report can be found 

Notes: I will update "usage" for script4 once it is ready I will recheck the links in Results and selecting the model section I have run the python file through the command line and it ran fine and will test the others once fixed @jraza19 - If we are using feather in processingdata.py, then we have to add it in the dependency section of readme, otherwise, we can delete if not using in the script @rachelywong - final report is in .rmd which is not displaying the data properly, can we change it to .md

Please let me know if I have missed anything.

rachelywong commented 3 years ago

@sukh2929 This looks really good!

Let's change it to be more similar to Tiffany's example (https://github.com/ttimbers/breast_cancer_predictor/tree/v2.0):

rachelywong commented 3 years ago

Regarding SCRIPT5, it has to be a .RMD file according to the milestone2 rubric. I will try to update SCRIPT5 with all the figures and data, and we can try again to add it to the READme

sukh2929 commented 3 years ago

Introduction For this project, we are trying to answer the predictive question: Given a patient’s demographic, medication history and management of diabetes during hospital stay, can we predict if they will be readmitted to the hospital or not? Analysis with machine learning models will identify features more likely to predict patient readmission. This will be important for management of hospital care for diabetic patients, because it will identify areas where changes can be made to decrease patient readmission and reduce burden on the healthcare system.

Importance: Due to Covid-19, it is critical to reduce the burden on the healthcare system and prevent readmission rates from increasing to make space for Covid cases. Our predictor aims to look at the diabetes management and diagnosis during a patient’s hospital stay to understand how much this affects their readmission. This will allow us to create and improve patient safety protocols to better manage diabetic patients during their hospital stay to provide effective care and prevent readmission during this critical time.
The data are submitted on behalf of the Center for Clinical and Translational Research, Virginia Commonwealth University, a recipient of NIH CTSA grant UL1 TR00058, and a recipient of the CERNER data. This dataset was collected from 1998-2008 among 130 US hospitals and integrated delivery networks. The dataset can be found here. Research from this collected data was used to assess diabetic care during hospitalization and determine if patients were likely to be readmitted or not. The paper by Strack et al. (2014) can be found here and the descriptions for the features can be found here.

Report The written report can be found here : link

Usage To replicate the analysis, clone this GitHub repository, install the dependencies listed below, and run the following commands at the command line/terminal from the root directory of this project:

# download data
 python src/download.py --local_path=./data/raw --url=https://archive.ics.uci.edu/ml/machine-learning-databases/00296/dataset_diabetes.zip

# data processing
 python src/processingdata.py --input=“data/raw/dataset_diabetes/diabetic_data.csv” --output=“data/processed”


# eda analysis python src/eda2.py --input=“data/processed/diabetes_with_race.csv” --output=“reports/figures”


# tune-test model python src/script4 --input=“data/processed/diabetes_clean.csv” —output=

# render report Rscript -e "rmarkdown::render('doc/script5.Rmd', output_format = 'github_document')"

Dependencies

License The materials used for this project are under an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited. If reusing/referencing, please provide a link to this webpage.

References: The paper detailing the data collection and research can be found here: https://www.hindawi.com/journals/bmri/2014/781670/ Feature descriptions about the data can be found here: https://www.hindawi.com/journals/bmri/2014/781670/tab1/ This dataset was taken from https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008#

rachelywong commented 3 years ago

@sukh2929 Looks awesome!! I think for references we can just add these 2 citations though:

Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, “Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,” BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

sukh2929 commented 3 years ago

changes done!

jraza19 commented 3 years ago

add dependecies: pandas_profiling==2.9.0 altair-saver==0.5.0

EDA code under usage needs to be: python src/eda2.py --input="data/raw/dataset_diabetes/diabetic_data.csv;data/processed/diabetes_with_race.csv" --output="reports/figures"

sukh2929 commented 3 years ago

changes done!