[X] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[X] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Documentation
[X] Installation instructions: Is there a clearly stated list of dependencies?
[X] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[X] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[X] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Code quality
[X] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[X] Style guidelides: Does the code adhere to well known language style guides?
[X] Modularity: Is the code suitably abstracted into scripts and functions?
[ ] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?
Reproducibility
[X] Data: Is the raw data archived somewhere? Is it accessible?
[X] Computational methods: Is all the source code required for the data analysis available?
[X] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[X] Automation: Can someone other than the authors easily reproduce the entire data analysis?
Analysis report
[X] Authors: Does the report include a list of authors with their affiliations?
[X] What is the question: Do the authors clearly state the research question being asked?
[X] Importance: Do the authors clearly state the importance for this research question?
[X] Background: Do the authors provide sufficient background information so that readers can understand the report?
[ ] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[X] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[X] Conclusions: Are the conclusions presented by the authors correct?
[X] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[X] Writing quality: Is the writing of good quality, concise, engaging?
Estimated hours spent reviewing: 1.5 hours
Review Comments:
Amazing job overall. Had to look hard to find issues.
Minor things:
I suggest moving the scripts to a /scripts folder instead of the /src folder. As is, the /src ends up having multiple functions. /scripts is a clearer name. That’s how the great Tiffany Timbers organized her example repo.
The report organization could be improved. I recommend a structure of
summary of results -> introduction -> data-set-info/parameters -> results/discussion (how the best model/hyper-parameters were found) -> conclusion/analysis -> further improvements
I recommend making the pre-analysis section only about the pre-known variables and not about conclusions that were discovered from the results/discussion part (IE the model chosen). Move that part to after the results/discussion and change the wording to feel more like an exploration into a conclusion.
Finally, it is not clear how to run the tests. There is a mention of it in the readme that links to the tests folder, but the test folder doesn’t have any readme or clear way to indicate how to perform the tests. It would have taken me too much work to verify them, without a more detailed tutorial. An example:
Data analysis review checklist
Reviewer: jcairn02
Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1.5 hours
Review Comments:
Amazing job overall. Had to look hard to find issues.
Minor things: I suggest moving the scripts to a /scripts folder instead of the /src folder. As is, the /src ends up having multiple functions. /scripts is a clearer name. That’s how the great Tiffany Timbers organized her example repo.
https://github.com/ttimbers/breast_cancer_predictor_py
The report organization could be improved. I recommend a structure of summary of results -> introduction -> data-set-info/parameters -> results/discussion (how the best model/hyper-parameters were found) -> conclusion/analysis -> further improvements
I recommend making the pre-analysis section only about the pre-known variables and not about conclusions that were discovered from the results/discussion part (IE the model chosen). Move that part to after the results/discussion and change the wording to feel more like an exploration into a conclusion.
Finally, it is not clear how to run the tests. There is a mention of it in the readme that links to the tests folder, but the test folder doesn’t have any readme or clear way to indicate how to perform the tests. It would have taken me too much work to verify them, without a more detailed tutorial. An example:
https://github.com/ttimbers/breast_cancer_predictor_py/blob/main/tests/README.md
Great work tho! All issues were relatively minor. Keep it up!!
Attribution
This was derived from the JOSE review checklist and the ROpenSci review checklist.