UBC-MDS / software-review-2022

0 stars 0 forks source link

Submission Group 21: textprepr (R) #35

Open mmaidana24318 opened 2 years ago

mmaidana24318 commented 2 years ago

Package Name: textprepr

One-Line Description of Package: Text preprocessing functions specifically designed for tweet data.

Submitting Author Name/ Github Handle:

Repository: https://github.com/UBC-MDS/textprepr

Version submitted: v1.0.0

Submission type: Standard

Editor: @arijc76, @joshsia, @mmaidana24318, @PhilsChan

Reviewers:

Package: textprepr
Title: Performs Pre-Processing of Tweets
Version: 0.0.0.9000
Authors@R: 
    person(given = "Arijeet",
           family = "Chatterjee",
           role = c("aut", "cre"),
           email = "arijc@student.ubc.ca")
    person(given = "Joshua",
           family = "Sia",
           role = c("aut", "cre"),
           email = "joshuasia2000@gmail.com")
    person(given = "Melisa",
           family = "Maidana",
           role = c("aut", "cre"),
           email = "placeholder@student.ubc.ca")
    person(given = "Philson",
           family = "Chan",
           role = c("aut", "cre"),
           email = "philsonchan@gmail.com")
Description: Functions which offer additional text preprocessing functionality
    specifically designed for tweets.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Suggests: 
    testthat (>= 3.0.0)
Config/testthat/edition: 3
Imports: 
    wordcloud,
    stringr,
    RColorBrewer,
    purrr,
    stopwords

Scope

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

LukeAC commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing:

1 hour

Review Comments

Unit tests run/verified via local instance of package repo.

Summary: Really cool package! I did not run into any issues with installation.

  1. I noticed the extract_ngram function might be missing some possible n-grams. For example: > textprepr::extract_ngram(c("one", "two", "three", "four"), n=2) [1] "one two" "two three" "three four" Is it anticipated that "four one" also be a valid n-gram returned by this function?
  2. Could it potentially be worthwhile stripping numbers (in addition to punctuation) from tweet/text data?
  3. It would be cool to integrate this package with one of the MDS groups whose package focuses on querying Twitter for tweet data. That way we could get an idea for how a 'real' wordcloud would look with real data.
  4. Excellent function documentation and examples given for how to use the functions!
  5. Could be worthwhile to delete unused/old branches which are no longer active ¯\_(ツ)_/¯. Not required for any functional reason, it's just good practice to keep project branching well organized.
Davidwang11 commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 2 hours


Review Comments

A cool and useful package!

Here are some suggestions:

  1. There are repeated if statement in the extract_ngram.R file
    if (length(tweets) < n) {
    stop("length of ngrams should be less than number of words in vector of tweets")
    }
    if(!is.character(tweets)) {
    stop("input should be a character vector")
    }
  2. It will be better to add more explaining comments in the code block of the function.
  3. If something like "#s#AusOpen" contained in the tweets, the extract_hashtags() function should return s#AusOpen or sAusOpen?
  4. It will be better to provide more different data type test cases in test-extract_ngram.R.
  5. It will be better to show some badges.
lipcai commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 35 minutes


Review Comments

First of all congratulations on creating a wonderful and useful package. The team has done a great job and I really found the documentation, docstrings and examples to be very good. They easily guided me through the process of working with your package. I have added my recommendations below as minor changes I feel could make this already very good package slightly better:

  1. It's better to include code maintainer's email information in the Contributing.md file. People who are interested in contributing need to know who to contact.
  2. It would be useful to have a visualization tools for the plots, as you say gaining insight into tweet data, a math plot could be much helpful. This would require significantly more work to build and test, so its understandable to keep things simple, but it's certainly an opportunity for improvement.
  3. Your ReadMe does not contain a code badge, which would be nice to display since its obvious you put in a lot of work to write your tests and your coverage is quite good!
  4. Having brief examples in the README.md file can be helpful for people who want to quickly get an idea of what package does through simple examples. It's better to show example output in the readme file.
  5. The name of the functions are very informative.

In general, this is great work and I enjoyed using your package!

AraiYuno commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing:


Review Comments

  1. extract_hashtags() function does not seem to be handling special characters or multiple #s. image
  2. No badges are shown in README.md. I believe showing the CI & CD badges will be very beneficial to give some confidence to the users to use the package because missing the badges could imply the lack of maintenance
  3. Great work separating the 4 different functions into 4 different files for both the implementation and unit tests. It does not matter whether you have one file or 4, but it is important to follow the same patterns. image image
  4. I am not able to find the example usage of some functions. I can only find the example usage for remove_punct() function in the documentation website. image
  5. Great unit test for test-generate_cloud.R. The unit tests surely look to be testing not only the input parameters but also the behaviour of the function. This would prevent someone to accidentally modify the behaviour of the function in a wrong way! image

Team 21 is rocking! Great work :>