Submission: Group 0: breast_cancer_predictor

flor14 commented 2 years ago

Submitting authors: Tiffany Timbers Repository: https://github.com/ttimbers/breast_cancer_predictor Report link: https://github.com/ttimbers/breast_cancer_predictor/blob/master/doc/breast_cancer_predict_report.html Abstract/executive summary: Here we attempt to build a classification model using the k-nearest neighbours algorithm which can use breast cancer tumour image measurements to predict whether a newly discovered breast cancer tumour is benign (i.e., is not harmful and does not require treatment) or malignant (i.e., is harmful and requires treatment intervention). Our final classifier performed fairly well on an unseen test data set, with Cohen’s Kappa score of 0.9 and an overall accuracy calculated to be 0.97. On the 142 test data cases, it correctly predicted 138. However it incorrectly predicted 4 cases, and importantly these cases were false negatives; predicting that a tumour is benign when in fact it is malignant. These kind of incorrect predictions could have a severly negative impact on a patients health outcome, thus we recommend continuing study to improve this prediction model before it is put into production in the clinic.

The data set that was used in this project is of digitized breast cancer image features created by Dr. William H. Wolberg, W. Nick Street, and Olvi L. Mangasarian at the University of Wisconsin, Madison (Street, Wolberg, and Mangasarian 1993). It was sourced from the UCI Machine Learning Repository (Dua and Graff 2017) and can be found here, specifically this file. Each row in the data set represents summary statistics from measurements of an image of a tumour sample, including the diagnosis (benign or malignant) and several other measurements (e.g., nucleus texture, perimeter, area, etc.). Diagnosis for each image was conducted by physicians.

Editor: @flor14 Reviewer: Mike Gelbart Alexi Rodríguez-Arelis Varada Kolhatkar

[x] I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be

flor14 commented 2 years ago

Data analysis review checklist

Reviewer: @mgelbart

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[ ] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robustness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[ ] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[ ] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[ ] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 hs

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

I really like the project so far! I would include more background information or specify explicitly the question to make it easier to the reader to understand what is this about.
What do you think about using tidymodels instead of caret? Could be a good change considering that many improvements have been included in those packages.
I appreciate that the all the project is reproducible. The link of the dataset is not working for me, you should check that.
Information about the authors is missing. I think it is important to be added.
Have you considered citing the packages? You can use the function citation() to get that information directly from R

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

flor14 commented 2 years ago

Thank you for the comments! I really appreciate your feedback

Regarding comment number 4 in this issue, I have added the name of the authors in this commit, now is working: 54a965a
Regarding the issue, we have modified ....
....
.... (you complete four of them)

UBC-MDS / data-analysis-review-2021