Open calsvein opened 3 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2
#>>> extra_cols(tweets_df)
to be >>> extra_cols(tweets_df)
. ERROR: Could not find a version that satisfies the requirement python-semantic-release<8.0.0,>=7.15.0 (from tweepyclean) (from versions: none) ERROR: No matching distribution found for python-semantic-release<8.0.0,>=7.15.0 (from tweepyclean)
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple tweepyclean
and that seemed to work. Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2.5
Hi team,
I have been enjoying in reviewing your amazing package. Your package tweepyclean is very creative and interesting! Please find my comments below:
Installation
By running the installation command in the README, I got the following error:
ERROR: Could not find a version that satisfies the requirement textstat<0.8.0,>=0.7.0 (from tweepyclean) (from versions: 0.4.1, 0.5.0, 0.5.1, 0.5.2)
ERROR: No matching distribution found for textstat<0.8.0,>=0.7.0 (from tweepyclean)
This is because some of your package dependecies are not on testPyPI. So I suggest you to add the --extra-index-url argument to pull the depencies from PyPI as follows:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple tweepyclean
Features
The users of your package might not be super clear about how some functions in tweepy work, just like me. So it would be better if you can put a link for the function tweepy.Cursor() to explain the following sentence you mentioned in the Features section of the README: The ability to generate a dataframe from the a tweepy.cursor.ItemIterator object returned by calling tweepy.Cursor(api.user_timeline,id=username, tweet_mode='extended').items() with the tweepy package
.
More importantly, I found that the description for the function sentiment_total()
is inconsistent with the source code and docstring. In the README, it is said that it will return a line chart. But actually, the function only returns a dataframe.
Usage
I found the Usage section of the README is a little bit general and hard to understand. It would be better if you can include some actual examples for each function along with the outputs, so that the users can try themselves and get a complete picture of the package. First, I recommend including import tweepy
to make the section more complete. Second, I recommend specifying the objects tweets
, data
, and clean_dataframe
. Finally, as mentioned before, I would highly recommend showing some example output plots and output dataframes.
Similarly, the description for the function sentiment_total(data, lexicon)
is inconsistent with the source code and docstrings.
Docstring & Documentation
Most of the docstrings are not rendered properly in the readthedocs. For example, I saw something like this:
3 x 5 sentiment word_count total_words <chr> <int> <dbl> anger 1 4 disgust 2 4 fear 1 4 negative 2 4 sadness 1 4
This is because the table you put in the docstring for the function sentiment_total()
can not be rendered properly. Also, for lots of functions, the examples become sub-bullet points of the returns.
Moreover, the examples in your docstrings seem incomplete. For example, instead of just using#>>> raw_df(tweets)
, I suggest you define the object tweet
in the example as well.
Tests
I suggest writing more tests for the function raw_df()
. For example, you can test if the output is a dataframe object or not. Also, you might call the function tweepy.Cursor() and create a tweepy.cursor.ItemIterator object for your tests.
Similarly, I suggest you add one more output test for the function sentiment_total()
. The expected output is a dataframe, so it is totally feasible to test if the output is exactly what we want.
Functionality
I have tried to run each of the functions based on the test file. Except for the function raw_df()
(I am not sure how I should get the input for this function), everything works as expected. This is great!
Overall, you did a great job in putting all of these together. Thanks for all the hard works. I hope my suggestions can help to improve your package in the future.
Submitting Author: Nash Makhija (@nashmakh), Matt (@MattTPin), Syad Khan (@syadk), Cal Schafer (@calsvein) Package Name: tweepyclean One-Line Description of Package: ad-on functions to the tweepy package for twitter data processing, word counts and sentiment analysis Repository Link: https://github.com/UBC-MDS/tweepyclean/tree/0.3 Version submitted: 0.3 Editor: Tiffany Timbers (@ttimbers ) Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Description
tweepyclean is a Python package built to act as a processor of data generated by the existing Tweepy package that can produce clean data frames, summarize data, and generate new features.
Tweepy is a package built around Twitter's API and is used to scrape tweet information from their servers.
Our package creates functions to process the raw data from Tweepy into a more understandable format by extracting and organizing the contents of tweets for a user. tweepyclean is specifically built to be used in analysis of a specific user's timeline (generated using tweepy's api.user_timeline function). Users can visualize average engagement based on time of day posted, see basic summary statistics of word contents and sentiment analysis of tweets and have a processed dataset for usage in machine learning models.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
The tweepy package extracts tweet data, but it is not in a format that it is ready for analysis. Tweepyclean performs functions to convert tweepy extracted data into a machine-readable dataframe, performs feature engineering, and creates summary statistics and basic visualizations.
The audience is strictly intended for those who are already using the tweepy package and have a Twitter API key.
Not that we are aware of.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here