calsvein commented 3 years ago

Submitting Author:

Repository: https://github.com/UBC-MDS/rtweetclean Version submitted: 0.3 Editor: Tiffany Timbers(@ttimbers ) Reviewers: TBD

Package: rtweetclean
Title: Processor of data generated by the existing rtweetclean package
Authors@R: c(
    person("Syad", "Khan", email = "syad@icogo.com", role = c("aut", "cre")),
    person(given = "Nash", family = "Makhija", email = "naurattan.makhija@gmail.com", role = "aut"),
    person(given = "Cal", family = "Schafer", role = "aut"),
    person(given = "Matthew", family = "Pin", role = "aut"))       
Description: rtweetclean creates functionality which enables users to process the raw data from rtweet into a more understandable format by extracting and organizing the contents of tweets for a user.
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
VignetteBuilder: knitr
License: MIT + file LICENSE


The tweepy package extracts tweet data, but it is not in a format that it is ready for analysis. Tweepyclean performs functions to convert tweepy extracted data into a machine-readable dataframe, performs feature engineering, and creates summary statistics and basic visualizations.

The audience is strictly intended for those who are already using the tweepy package and have a Twitter API key.

nphaterp commented 3 years ago

Package Review

The package includes all the following forms of documentation:

Estimated hours spent reviewing: 1.5 hours

Review Comments

After installing and running your package, I can say that everything on my end is running smoothly. Good Job!

My main feedback falls into 2 categories, repository quality and consistency between functions. Keep in mind that I am being picky here and that overall the project is very good.


  1. The link in your CONTRIBUTING.md that directs users to your repository CODE_OF_CONDUCT.md is broken. (I believe you might have shifted file structure and forgotten to alter links)

  2. I found the Fixing Typos section of the CONTRIBUTING.md a little misleading at first. To me it suggests that I should be able to directly edit your repository on GitHub if I want to make a typo fix. In reality users still must fork the repo. Consider either making this more explicit in the section or moving the Fixing Typos section to the bottom of the file, after an explanation has been given about forking the repo.

  3. My suggestion would be to include all authors within the LICENSE.md. Currently Syad is the only one.

  4. Consider changing the file structure. Specifically, I struggled to find the CONTRIBUTING.md as it was in the .github directory. Please keep in mind that if you make this change, you will need to alter all links that are referencing it at it's current location.

Function Consistency

  1. I notice that some functions explicitly use the return keyword, while others just call the returned object. Consider keeping this consistent between functions for a more professional feel to your code.

  2. I notice that although the roxygen2 style is followed in general, the content of the docstrings are slightly inconsistent. Specifically, some @params will list the object and it's type, while others will list the object along with a brief explanation of the object. Consider keeping this consistent between functions for a more professional feel to your code.

  3. Whilst reading through your docstrings, I made small edits that you may or may not want to implement. The 3 things that I did were:

    • Added a whitespace after every roxygen2 comment (ie #')
    • Removed accidental double comments (ie. #' #')
    • reference dataframe objects in a more consistent manner (ie changed data.frame -> dataframe)

You can find my suggested changes here in a PR.

Overall great work. I look forward to hearing from you.

ttimbers commented 3 years ago

So nice of you to open a PR to send your suggested changes @nphaterp!

MarcSunUBC commented 3 years ago

Package Review

The package includes all the following forms of documentation:

Estimated hours spent reviewing: 1

Review Comments

  1. The vignette document is very clear and easy to follow.
  2. I really liked your project idea, this package would be very helpful for people who aim to train their machine learning models for language processing problems!
  3. Would it be possible to achieve data scraping just as a twitter guest(without using a developer account) or automate the creating account process?
  4. For the clean_df function, exception is handled for the raw_tweets_df, consider exception handling for other parameters?
  5. In the docstrings, I struggled a little trying to understand the parameters, it would probably be more helpful to have more description about them.