Open elenagan opened 2 years ago
Hello, Group 17. Congratulation on your work on this heart disease predictor. Below are my comments based on your project!
The src contains concisely the four files that were used for the pipeline of analysis. The structure is clear and no files are too deep from the root of the project.
Usages are properly documented; however, the instruction does not match the actual scripts in the repository (for example the instruction says download_data.py
whereas the src contains fetch_data.py
). If your code is under development, do not forget to update the Readme after modifications.
Yep. Functions are well-written and well-documented. The scripts are modular with helper functions.
The source code in src is clear on which file to call. I was able to execute until the analysis. But when I was trying to generate the report it returns the error pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
I was not sure if this was only my machine so if others return a similar problem please note that.
The writing was coherent and concise. The eda was not too overwhelming and the result is clear. However, I notice that in your book.pdf
one of the tables is cut off because it was too long. I suggest removing some of the unnecessary contents like standard deviation to only reveal the meet (test/train scores)
The comments can be summarized in the points below:
make
only).Overall, the project is in a good shape toward completion. The scripts are very solid and the analysis was quite insightful. There are a few things I mentioned in the previous comments and if you have time you can consider addressing them.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Note: the whole evaluation is done using the Milestone 3 release version of the project repository (tag: 0.2.0, commit: 4ba76b9)
First of all, I really appreciate the level of detail and effort your team has put into it. The whole project is well constructed with comprehensive materials and extra components. I personally like the code documentation and the linkage between different modules.
I notice there are two functions, namely model
and test
in the src/model.py
. Since these two functions are from the two main processes of the machine learning pipeline, I would suggest splitting model training and model testing into two separate scripts. By decoupling these 2 processes, the user can determine whether we choose either to train or test the model. It can provide more flexibility for the user.
In terms of code modularity and readability, there is a save_chart
function defined within a function called eda
in the src/eda.py
. It is better to separate the save_chart
from the master function or create a utility folder for organizing the util
related function, like save_chart
, so that other scripts can also make use of the chart saving function. It can improve code readability and scalability.
In the readme file, the team mentioned using 4 machine learning models, however, 2 of the models are missing in the actual implementation. Perhaps it can include more models in the analysis if time permits. But personally, I think 2 models are enough for this project.
There is a minor issue when I run the make all
command. When the GitHub repository is in a path that contains spaces (e.g. C:\Users\abc\UBC MDS\DSCI 522 Workflows\git), some extra folders are generated. (see below)
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
mark_square
, mark_rect
, etc. functions to visualize the distribution over the values. Scatter plots do not seem to show the distribution effectively here.This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the [JOSE review checklist](https://openjournals.readthedocs.io/en/jose/review_checklist.
Submitting authors: @Natalie-cho @Yurui-Feng @elenagan @tzoght
Repository: https://github.com/UBC-MDS/heart_disease_predictor Report link: https://github.com/UBC-MDS/heart_disease_predictor/blob/main/book.pdf Abstract/executive summary: Responsible for 16% of the world's total deaths in 2019, heart disease is the world's leading cause of death according to the World Health Organization. The development of heart disease can not be contributed to a single factor in isolation, making early detection difficult given so many risk factors.
The goal of this project is to predict the presence of heart disease based on common early signs and the Heart Disease UCI dataset from the UC Irvine Machine Learning Repository to answer the question: given common early signs and physiological indicators, can we predict the presence of cardiac disease based on symptoms such as chest pain, blood pressure, or resting ECG?
Responding to this question may aid in the early detection of heart disease and may help with earlier treatment, crucial to improving an individual's chances of survival.
Editor: @flor14 Reviewer: Luke Yang, Caesar Wong, Xinru Lu, Manvir Kohli