DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 4: Predicting the Shares of News Articles based on Social networks #4

Open ttimbers opened 6 months ago

ttimbers commented 6 months ago

Submitting authors: Amar Gill, Anshnoor Kaur, Hanyu Dai, Yanxin Liang

Repository: https://github.com/DSCI-310-2024/DSCI-310_predicting-shares_group-4/releases/tag/Milestone-3

Abstract/executive summary:

Use dataset from open source websites UCI Machine Learning Repository: Data Sets to predict the number of shares of a article. We split the data into traing and testing parts, and proceed full model and reduced model to make the results more reliable and efficent enough.

Editor: @ttimbers

Reviewer: Riddha Tuladhar Gurman Gill Cassandra Zhang Fiona Chang

FionaC124 commented 5 months ago

Data analysis review checklist

Reviewer: FionaC124

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.25

Review Comments:

  1. Some parts of your README could be more detailed in order for viewers to run your code with the least friction. For example, specific instructions for "Then using a CLI (Command line interface), set the release to be your current directory" or a link to Docker download. In the Makefile section, markdown styling could have been used to identify the "make all" and "make clean" commands more easily.

  2. In your CONTRIBUTING documentation, I felt that the process for submitting code is a little vague. For example, after I reach out to a group member about a new feature for the project, do I have to wait for their approval before starting work on a pull request? Or am I just reaching out to notify? Is there an issue that needs to be created to contribute?

  3. Some minor typos or mistakes that could be fixed but don't necessary affect the project on a high level:

    • "traing" in README.md Overview
    • "Contrubuting" in CONTRIBUTING.md title
    • Title left as "Title" in report table of contents

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

gurmangill125 commented 5 months ago

Data analysis review checklist

Reviewer: gurmangill125

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

  1. The CONTRIBUTING documentation introduces the process for contributions but leaves room for ambiguity regarding the workflow for proposing new features or changes. It would be beneficial to clarify whether an informal notification is okay or if formal approval is required before proceeding with contributions, such as pull requests, and whether the creation of an issue is a prerequisite for contributing to the project. More clear and detailed documentation would be great.

  2. While the documentation is fundamentally fine, it contains minor errors that suggest areas for improvement in attention to detail. Specifically, there are a few spelling mistakes. For example, the misspelling of "training" as "traing" in the README.md and "Contrubuting" in the CONTRIBUTING.md title. Additionally, the oversight of leaving a placeholder title in the report's table of contents could easily be fixed for a more polished presentation.

  3. The README presents a decent overview of the project, but further details could significantly benefit people viewing the repository. For instance, the section describing the setup with Docker lacks comprehensive guidance for users unfamiliar with Docker or CLI operations. A step-by-step walkthrough, including screenshots or a direct link to Docker installation resources, could mitigate potential setup barriers. Similarly, in the Makefile section, adopting markdown formatting to distinctly highlight command lines like make all and make clean would streamline user interaction by making instructions more accessible and clearer to follow.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

riddhat commented 5 months ago

Data analysis review checklist

Reviewer: riddhat

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2.5

Review Comments:

I like your project idea! I think your research question in particular is intriguing. I do, however, believe there is room for improvement:

  1. The README file is fairly ambiguous and appears to be missing some important information. Crucially, it does not clearly state or instruct the reader on how to run the analysis nor does it mention how to one might contribute to the project. In particular, the makefile instructions for generating the analysis could be more explicit, maybe by having easily copy-paste-able code for the user to use. Furthermore, adding instructions on how to install the dependencies by providing a link to the docker website would be nice.
  2. In the CODE_OF_CONDUCT.md file, the procedure for reporting and dealing with unacceptable behavior could be improved. In particular, it mentions that an individual should "report any issue to the person in charge" yet does not mention who this person in charge is or how they are chosen.
  3. In R/ folder, clean_data.R and histograms.R should specify the package each function used is from for the sake of reproducibility, just so that if a new package is added to the analysis that contains a function with the same name that the script will run as intended. In particular, histograms.R does not specify the package for any of the ggplot2 functions used (ggplot, aes, geom_histogram) and clean_data.R does not specify the package for the all_of function in line 30. The all_of function is particularly important as I believe that, in the tidyverse package that is imported in the analysis, the packages dplyr, tidyr, and tidyselect all have specifications for an all_of function.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

cass12345 commented 5 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.