UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: rtweetclean (R) #21

Open calsvein opened 3 years ago

calsvein commented 3 years ago

name: rtweetclean about: Use this template to submit software for review


Submitting Author:

Repository: https://github.com/UBC-MDS/rtweetclean Version submitted: 0.3 Editor: Tiffany Timbers(@ttimbers ) Reviewers: TBD

Archive: TBD Version accepted: TBD


Package: rtweetclean
Title: Processor of data generated by the existing rtweetclean package
Version: 0.0.0.9000
Authors@R: c(
    person("Syad", "Khan", email = "syad@icogo.com", role = c("aut", "cre")),
    person(given = "Nash", family = "Makhija", email = "naurattan.makhija@gmail.com", role = "aut"),
    person(given = "Cal", family = "Schafer", role = "aut"),
    person(given = "Matthew", family = "Pin", role = "aut"))       
Description: rtweetclean creates functionality which enables users to process the raw data from rtweet into a more understandable format by extracting and organizing the contents of tweets for a user.
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Imports: 
    magrittr,
    dplyr,
    lubridate,
    ggplot2,
    tidyr,
    tidytext,
    rtweet,
    stringr,
    knitr
Suggests: 
    testthat,
    rmarkdown
VignetteBuilder: knitr
License: MIT + file LICENSE

Scope

The tweepy package extracts tweet data, but it is not in a format that it is ready for analysis. Tweepyclean performs functions to convert tweepy extracted data into a machine-readable dataframe, performs feature engineering, and creates summary statistics and basic visualizations.

The audience is strictly intended for those who are already using the tweepy package and have a Twitter API key.

Not that I am aware of.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

nphaterp commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 1.5 hours


Review Comments

After installing and running your package, I can say that everything on my end is running smoothly. Good Job!

My main feedback falls into 2 categories, repository quality and consistency between functions. Keep in mind that I am being picky here and that overall the project is very good.

Repository

  1. The link in your CONTRIBUTING.md that directs users to your repository CODE_OF_CONDUCT.md is broken. (I believe you might have shifted file structure and forgotten to alter links)

  2. I found the Fixing Typos section of the CONTRIBUTING.md a little misleading at first. To me it suggests that I should be able to directly edit your repository on GitHub if I want to make a typo fix. In reality users still must fork the repo. Consider either making this more explicit in the section or moving the Fixing Typos section to the bottom of the file, after an explanation has been given about forking the repo.

  3. My suggestion would be to include all authors within the LICENSE.md. Currently Syad is the only one.

  4. Consider changing the file structure. Specifically, I struggled to find the CONTRIBUTING.md as it was in the .github directory. Please keep in mind that if you make this change, you will need to alter all links that are referencing it at it's current location.

Function Consistency

  1. I notice that some functions explicitly use the return keyword, while others just call the returned object. Consider keeping this consistent between functions for a more professional feel to your code.

  2. I notice that although the roxygen2 style is followed in general, the content of the docstrings are slightly inconsistent. Specifically, some @params will list the object and it's type, while others will list the object along with a brief explanation of the object. Consider keeping this consistent between functions for a more professional feel to your code.

  3. Whilst reading through your docstrings, I made small edits that you may or may not want to implement. The 3 things that I did were:

    • Added a whitespace after every roxygen2 comment (ie #')
    • Removed accidental double comments (ie. #' #')
    • reference dataframe objects in a more consistent manner (ie changed data.frame -> dataframe)

You can find my suggested changes here in a PR.

Overall great work. I look forward to hearing from you.

ttimbers commented 3 years ago

So nice of you to open a PR to send your suggested changes @nphaterp!

MarcSunUBC commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 1


Review Comments

  1. The vignette document is very clear and easy to follow.
  2. I really liked your project idea, this package would be very helpful for people who aim to train their machine learning models for language processing problems!
  3. Would it be possible to achieve data scraping just as a twitter guest(without using a developer account) or automate the creating account process?
  4. For the clean_df function, exception is handled for the raw_tweets_df, consider exception handling for other parameters?
  5. In the docstrings, I struggled a little trying to understand the parameters, it would probably be more helpful to have more description about them.