Submission: tweetr (R) - Githubissues

name: tweetr about: Use this template to submit software for review

Submitting Author: Yuan Xiong (@yuanxiongbear) Other Authors: Huanhuan Li (@huan-ds), Yuanzhe(Marco) Ma (@mmyz88), Jared Splinter (@JaredSplinter) Repository: https://github.com/UBC-MDS/tweetr.git Version submitted: v0.2.0 Editor: Tiffany Timbers(@ttimbers) Reviewers: TBD

Archive: TBD Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: tweetr
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R: 
    c(person("Huanhuan", "Li", email = "lihuanhuan1003@hotmail.com", role = c("aut", "cre", "ctb", "cph")),
    person("Yuanzhe(Marco)", "Ma", role = c("aut", "ctb", "cph")),
    person("Jared", "Splinter", role = c("aut", "ctb", "cph")),
    person("Yuan", "Xiong", role = c("aut", "ctb", "cph"))) 
Description: This is a R package for text analysis and sentiment analysis on tweets. The package will allow you to extract tweets from Twitter, visualize user habit on tweet posting, and apply sentiment analysis to the data.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
URL: https://github.com/UBC-MDS/tweetr
BugReports: https://github.com/UBC-MDS/tweetr/issues
Config/testthat/edition: 3
Depends: 
    tidyverse,
    R (>= 2.10)
Imports: 
    lubridate,
    testthat,
    ggplot2,
    twitteR,
    dplyr,
    tidytext,
    magrittr,
    cowplot,
    rlang

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [x] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
- tweetr uses another package twitteR to extract information from tweeter for a given user. The extracted data in the format of dataframe is then used for various analysis that lead to generation of summary plots.
Who is the target audience and what are scientific applications of this package?
- The audiences are tweeter users that are interested in their own tweet behaviors. Or audiences that are interested in studying particular user's tweet (like celebrity).
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
- There are more sophisticated tweet analysis apps. But no other simple package like tweetr that we are aware of.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Package Review

Please check off boxes as applicable, and elaborate in the comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples (that run successfully locally) for all exported functions
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. Partially. See comments below
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: ~1 hour

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Great work overall! Appreciate the effort all of you have been put into the development of this package. Here're some places that I think can be improved:

The installation instructions in README say the package can be installed from CRAN while it is not. Therefore, you should consider removing the CRAN part and just leave the GitHub installation instruction there.
Your get_tweets() function does not give users options to input Twitter API credentials directly as function parameters. This could sometimes be inconvenient as package users might not always have access to the base operating system (think about the Kaggle notebook we have used in our neural network course). Therefore, you should consider adding the optional parameter(s) that allow users to input the credential via code directly.
The last few lines of tweetr.R contains some testing code for development. Although they have been commented out, you should still consider removing them from your source code. If you believe this is something useful for demonstration, you can put it in your documentation instead.
There are 2 license files in your root directory: LICENSE and LICENSE.md. The file LICENSE only contains the names of the authors and without any license text. This has confused GitHub and make it unable to infer your license and label it on the top page (I believe this is an issue in the original cookie-cutter template). You should consider merging the content of these 2 files all into LICENSE.
Your test cases could be a little simple as it does not actually check the data returned from the function. I understand checking the data could be hard because your data is from Twitter which is not in your complete control. What you can do under this case is to use Mocking. Basically, you mock all functions from twitteR used in your package, and then make the mock object return some deterministic and simple data to check against. In that way, you will be able to test your function without actually fetching dynamic data from Twitter. One mock package you can consider using is: https://cran.r-project.org/web/packages/mockery/vignettes/mocks-and-testthat.html

Again, good work everyone. Thank @mmyz88 for letting me know the way to run the package.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples (that run successfully locally) for all exported functions
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 1 hour

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Hi Team,

Thanks for building this package tweetr! I enjoy playing around with this package and it surely performs meaningful text analysis and sentiment analysis on tweets.

While I was using it, I remarked a few things down as follows, and hope it could help to enhance the overall user experience more.

It would be better by showing some function results along with pictures, tables if any in the Features block under README to provide users with a more clear and straightforward overview of this package, instead of writing descriptions with sentences only.
Currently, the function plot_hashtags counts hashtag words and gets the top 15 frequent words. It would be more flexible and reasonable without the hardcode as 15. Instead, you might put the number of frequent words as an argument in this function for future adjustment. For instance, users might be interested in the top 30 frequent words hence it is always better without any hardcode.
I am wondering if it is possible to expand more for the function plot_timeline since the count of tweets cannot respond to complete information of the timeline analysis. Some tweets include more sentences while others might not. I am wondering whether it would be better to count the words of all the tweets posted at a certain time range instead of counting the tweets only.
As for function visualize_sentiments, again, I would suggest adding an argument N with a default value of 10 to retrieve the top N positive and negative results, leading to a more flexible function without any hard code. Your current function can only obtain the top 10 results from the data frame of sentiment analysis, which is quite limited.
The last thing is related to documentation. It would be better if you guys can attach your team member names with the corresponding Github accounts under README and code of conduct for further development purposes and contact.

UBC-MDS / software-review-2021

Submission: tweetr (R) #32

Scope

Technical checks

Publication options

Code of conduct

Package Review

Documentation

For packages co-submitting to JOSS

Functionality

Review Comments

Package Review

Documentation

For packages co-submitting to JOSS

Functionality

Review Comments