DSCI-310 / data-analysis-review-2021

0 stars 1 forks source link

Submission: 5: Predicting Breast Cancer With Multiple Classification Algorithms #5

Open ttimbers opened 2 years ago

ttimbers commented 2 years ago

Submitting authors: @edile47 @clichyclin @nhantien @ClaudioETC

Repository: https://github.com/DSCI-310/DSCI-310-Group-5

Abstract/executive summary: The project seeks to provide a solution to the prediction problem of spotting benign and malignant tumors, which comes from the question "Is there a way to efficiently classify whether a tumor is malignant or benign with high accuracy, given a set of different features observed from the tumor in its development stage?". Such problem was resolved using a predictive model. Our initial hypothesis was that it is possible to do so yet it would have a high error rate due to tumors features' variations. After performing EDA, such as summary statistics and data cleaning and visualization, we were able to spot some clear distinctions between benign and malignant tumors in some features. We then tested multiple different classification models and arrived at a K-Nearest-Neighbor model with tuned hyperparameters with very good accuracy, recall, precision and f1 score.

Editor: @ttimbers

Reviewer: @TimothyZG @hmartin11 @poddarswakhar @nkoda

TimothyZG commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Overall, they clearly put a lot of work into the project, I'm especially impressed by the reliable workflow they created.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

poddarswakhar commented 2 years ago

Data analysis review checklist

Reviewer: @poddarswakhar

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.2 Hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

1.) Style guidelines: I believe there is a roam for improvement for the style guidelines on the script files, to be more specific like commenting part of codes to explain briefly what a chunk of code is doing, for easy understanding and following the code.

2.) For the data part, couldn't find the source of the raw data, in the make file it's more of like reading the CSV from the directory and then doing the analysis, for some readers this might not be super transparent.

3.) In the analysis file couldn't find the authors, so couldn't check that box

4.) Overall well done, really loved the analysis and the methodology used! Loved the use of pipelines, making some of the code simpler and avoiding redundancy.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

hmartin11 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

-overall great job! Your report was interesting to read, and easy to follow. -the Docker instructions were easy to follow and worked well, making the project reproducible which is a very important aspect!

nkoda commented 2 years ago

Data analysis review checklist

Reviewer: nkoda

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 Hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.