Feedback addressed - Githubissues

          ## Data analysis review checklist

Reviewer:

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[ ] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[ ] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 3.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

1) Enhancing Readability and Efficiency Repository Cleanup: I would consider removing unnecessary files/folders like .local and .ipython, and possibly reorganizing or removing the eda_files folder for clearer navigation. This aligns with making the repository more streamlined and user-friendly. I suggest to split complex functions into simpler, single-responsibility functions for better maintainability. For example, rank_correlations could be divided into functions focused on matrix flattening and duplicate name removal. Additionally, there's a call for more comprehensive testing, including cases of erroneous inputs, to ensure robust error handling. We should always update the documentation to reflect current repository structure and instructions for using Docker environments. Highlighting the need for a detailed CONTRIBUTION guide and clear instructions on accessing and using the Docker container to facilitate contributions and project setup.

2) Prioritizing user experience and reproducibility and enhance visual clarity: Consider include relevant visualizations in the README with descriptive captions to improve understanding and engagement. This suggests that while the project's visual aspects are strong, their integration and presentation can be optimized for better impact. Docker Utilization and Guidance: Having detailed Docker instructions, particularly for newcomers to Docker is really important. This includes comprehensive steps for setting up, accessing, and using the Docker environment, which is critical for ensuring that all users, regardless of their familiarity with Docker, can easily reproduce the analysis and navigate the project.

3) Reproducibility and Hard-coded Values: There’s the issue of hard-coded values in reports, we need dynamic value references to enhance reproducibility and ease of updates. Please try to avoid redundant code through proper imports and maintaining an updated issue tracker to reflect the current status of the project accurately. This not only aids in project organization but also in the efficiency and clarity of collaboration.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Originally posted by @lucyliuyihong in https://github.com/DSCI-310-2024/data-analysis-review-2024/issues/9#issuecomment-2041982221

DSCI-310-2024 / DSCI_310_Group_9_NY-airbnb-analysis

Feedback addressed #66