[x] Repository: Is the source code for this data analysis available? Is the repository well-organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Comments:
The src contains concisely the four files that were used for the pipeline of analysis. The structure is clear and no files are too deep from the root of the project.
Documentation
[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[ ] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Comments:
Usages are properly documented; however, the instruction does not match the actual scripts in the repository (for example the instruction says download_data.py whereas the src contains fetch_data.py). If your code is under development, do not forget to update the Readme after modifications.
Code quality
[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?
Comments:
Yep. Functions are well-written and well-documented. The scripts are modular with helper functions.
Reproducibility
[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[ ] Automation: Can someone other than the authors easily reproduce the entire data analysis?
Comments:
The source code in src is clear which file to call. I was able to execute until the analysis. But when I was trying to generate the report it returns the error pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded. I was not sure if this was only my machine so if others returns the similar problem please note on that.
Analysis report
[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?
Comments:
Writing was coherent and concise. The eda was not too overwhelming and the result is clear. However, I notice that in your book.pdf one of the table is cutoff because it was too long. I suggest removing some of the unnecessary contents like standard deviation to only reveal the meet (test/train scores)
Estimated hours spent reviewing: 1
Review Comments:
Overall, the project is in a good shape towards completion. The scripts are very solid and the analysis was quite insightful. There are a few things I mentioned in the previous comments and if you have time you can consider addressing them.
General
Hello, Group 17. Congratulation on your work on this heart disease predictor. Below are my comments based on your project!
Data analysis review checklist
Reviewer: @lukeyf
Conflict of interest
Code of Conduct
General checks
Comments:
The src contains concisely the four files that were used for the pipeline of analysis. The structure is clear and no files are too deep from the root of the project.
Documentation
Comments:
Usages are properly documented; however, the instruction does not match the actual scripts in the repository (for example the instruction says
download_data.py
whereas the src containsfetch_data.py
). If your code is under development, do not forget to update the Readme after modifications.Code quality
Comments:
Yep. Functions are well-written and well-documented. The scripts are modular with helper functions.
Reproducibility
Comments:
The source code in src is clear which file to call. I was able to execute until the analysis. But when I was trying to generate the report it returns the error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
I was not sure if this was only my machine so if others returns the similar problem please note on that.Analysis report
Comments:
Writing was coherent and concise. The eda was not too overwhelming and the result is clear. However, I notice that in your
book.pdf
one of the table is cutoff because it was too long. I suggest removing some of the unnecessary contents like standard deviation to only reveal the meet (test/train scores)Estimated hours spent reviewing: 1
Review Comments:
Overall, the project is in a good shape towards completion. The scripts are very solid and the analysis was quite insightful. There are a few things I mentioned in the previous comments and if you have time you can consider addressing them.
Attribution
This was derived from the JOSE review checklist and the ROpenSci review checklist.