UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: GROUP 10: CadioPredict #24

Open MDSFusionist opened 7 months ago

MDSFusionist commented 7 months ago

Submitting authors: @sandygross Sandy Gross @MDSFusionist Doris Wang @hema2022ubc He Ma @joeywwwu Joey Wu

Repository: https://github.com/UBC-MDS/CardioPredict Report link: https://ubc-mds.github.io/CardioPredict/heart_analysis_report.html Abstract/executive summary: Cardiovascular disease (CVD) remains a leading cause of mortality globally, necessitating the development of accurate predictive tools for early detection and intervention. This study utilizes a practice dataset from the renowned Framingham Heart Study (FHS) comprising clinical, demographic, and behavioral variables from patients at risk of CVD. The study employed a methodological approach centered on hyperparameter optimization of the k-Nearest Neighbors (kNN) algorithm, supplemented by an oversampling technique to address class imbalances and improve model sensitivity. Despite modest levels of accuracy (0.623) and recall (0.552), our model underscores the significance of cholesterol levels and smoking habits as substantial contributors to cardiovascular disease risk, alongside established factors such as age and systolic blood pressure. These insights pave the way for future investigations into the complex interplay of causal factors, intending to refine the predictive accuracy and clinical utility of the model.

Editor: @MDSFusionist Doris Wang Reviewer: <@carrieyanyi Yan Carris> <@Rachel0619 Rachel LI> <@shawnhu444 Shawn Hu> <@sungg888 Ruocong Sun>

shawnhu444 commented 7 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

Review Comments:

This project exemplifies a highly standardized approach to data analysis and software development, aligning with best practices in several key areas:

  1. High code quality, functionality documentation and follow the community guidelines. These aspects are vital for maintaining and scaling the software efficiently, ensuring its long-term viability.

  2. The code submitted ensures data accessibility by providing a complete computational methods and putting details, functions. These features significantly enhance the reproducibility and reliability of the research, which are cornerstones of scientific rigor.

  3. Reporting: The comprehensive approach to reporting, including clear articulation of research questions, background, functions, and coupled with high-quality writing and complete referencing .This demonstrates an exemplary standard in code communication.

In summary, this project stands out for comprehensive documentation, code quality, reproducibility, and thorough analysis reporting.

sungg888 commented 7 months ago

Data analysis review checklist

Reviewer: sungg888

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

1.5 hours

Review Comments:

  1. Documentation: The project is good with its detailed documentation, easy installation, and usage examples. The inclusion of community guidelines significantly aids collaboration and user support.
  2. Code Quality: The code is well-organized, readable, and adheres to style guidelines, enhancing its maintainability. The project also have clear and readable robust testing framework to test its reliability.
  3. Reproducibility: The project excels in reproducibility, providing accessible raw data, comprehensive computational methods, and clear steps in Readme.
  4. Analysis Report: The analysis report is clear and detailed, effectively communicating the research question, methods, and results. The high-quality writing makes the report engaging and informative. The report covers all points in the rubric. Overall, the project is a high-quality, reproducible research. And it meet the standards of review checklist.

    Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Rachel0619 commented 7 months ago

Data analysis review checklist

Reviewer: @Rachel0619

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour

Review Comments:

Overall, the project is conducted in high-quality and I'm deeply impressed by your hard work.

  1. I really appreciate the plotting part of the report. The figures are well made and organized.
  2. Low recall might be an issue in the context of early detection and intervention a disease (if this is the research question of this project). You might want to try some other classification models to see if you can get higher recall if you have time and really want to make it perfect. But what it is now is good enough.
  3. The src file is a little bit messy now with different type of files in it.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

carrieyanyi commented 7 months ago

Data analysis review checklist

Reviewer: @carrieyanyi

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

2hrs

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

I really enjoy going through this project and gain some knowledge about Cardiovascular Disease. The overall structure of the project is well organized. Both figures and models have sufficient descriptions.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.