Group 8 - pysentimentanalyzer

Submitting Author: Eric Tsai (@erictsai1208), Ranjitprakash Sundaramurthi (@ranjitprakash1986), Ziyi Chen (@zchen156), Tanmay Agarwal (@tanmayag97) Package Name: pysentimentanalyzer One-Line Description of Package: Perform sentiment analysis on the given texts and summarizes information from the text. Repository Link: https://github.com/UBC-MDS/py-sentimentanalyzer Version submitted: v0.2.3 Editor: TBD
Reviewer 1: Roan Raina Reviewer 2: Jenit Jain
Reviewer 3: Manvir Kohli Reviewer 4: Crystal Geng Archive: TBD
Version accepted: TBD Date accepted (month/day/year): TBD

Description

Include a brief paragraph describing what your package does: When a survey asks for written comments, it is often tedious to read through every response to extract useful information or just to get a quick summary. By using this package, responses can be quickly summarized to get a general idea of the sentiments of the comments, which can be useful such as when a PR team wants to know the overall sentiment on a company or when instructors want to know the overall sentiment on a course. The goal is to provide a quick summary that is easily interpretable by combining results from a pre-trained Python natural language processing package with the use of visualizations.

Scope

Please indicate which category or categories this package falls under:
- [ ] Data retrieval
- [x] Data extraction
- [ ] Data munging
- [ ] Data deposition
- [ ] Reproducibility
- [ ] Geospatial
- [ ] Education
- [x] Data visualization*

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?
  People analyzing feedback and survey results, such as PR teams or management teams.
- Are there other Python packages that accomplish the same thing? If so, how does yours differ? There exists other packages that perform sentiment analysis, but there are none like ours that summarize the results and present it in a simple manner through visualizations.
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has an OSI approved license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

[x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

[x] I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[x] Function Documentation: for all user-facing functions.
[x] Examples for all user-facing functions.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[x] Badges for:
- [x] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [ ] A repostatus.org badge,
- [x] Python versions supported,
- [ ] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

[x] Short description of package goals.
[x] Package installation instructions
[x] Any additional setup required to use the package (authentication tokens, etc.)
[x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[ ] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Review Comments

An amazing package guys! Great job. Truly simplifies the functionality of assessing text sentiments and is something we can actually use in practice! I do have some comments that I hope will improve the package even further and make it more robust.

Since nltk is a very active repository, they are constantly updating their machine learning models and their codebase, hence you might want to replace the function test_aggregated_sentiment_score_output_value. They may update the model used by SentimentIntensityAnalyzer in the future and this might return a different sentimental score for the test string used by the function, which will break your test cases.
It might be more user-friendly if you add an argument to the generate_wordcloud function that specifies for which sentiment to generate a wordcloud (positive, negative or neutral). This will increase the speed of execution of the function and return only the necessary wordclouds.
It is unclear what the sentiment score actually means in the aggregate_sentiment_score function. You can specify the range of the values that one can expect as a response from the function and how to interpret those values qualitatively.
Instead of returning a pyplot object in the sentiment_score_plot function, you can return a PIL object. This is a better practice as it can't be overwritten by the pyplot object that might already exist in the environment the user is working in.
You can update the __init__.py file in the package that imports all the important functions. This will ensure more intuitive access to the package's functionalities while hiding the helper functions from direct usage.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[x] Function Documentation: for all user-facing functions.
[x] Examples for all user-facing functions.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[ ] Badges for:
- [ ] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [ ] A repostatus.org badge,
- [ ] Python versions supported,
- [ ] Current package version (on PyPI / Conda).

[x] Short description of package goals.
[x] Package installation instructions
[ ] Any additional setup required to use the package (authentication tokens, etc.)
[ ] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [ ] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[ ] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information

Usability

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 1.5

Review Comments

Overall, the package is very well-thought and put together nicely. Very good job! I like the idea of incorporating visualization into sentiment analysis, which gives people a more straightforward way of understanding their analysis output.
In the documentation, it might be worth explaining what the "sentiment score" actually means in the aggregate_sentiment_score function, i.e., what their scale/range is, and what positive/negative results mean. For people who do not have much knowledge in sentiment analysis, these numerical results given with no detailed description might potentially be a little bit confusing. Likewise, it might also be a good idea to describe what the likert scale of the sentiment score means in the convert_to_likert function, i.e., "1 means strongly disagree and 5 means strongly agree".
Since the input for all the functions includes a dataframe and a column, it might also be worth explaining in the README/usage how to preprocess the dataframe (for example, the input column used in the function can only be strings) to the users.
With 3 said, a following suggestion is that more tests can be added to improve the 78% code coverage. For example, in the aggregate_sentiment_score function, a test can be added to check the datatype of the input columns.
It might be a good idea to clarify if the sentiment analyzer is able to analyze any languages or just English. I saw you used SentimentIntensityAnalyzer() from nltk in the helper function, so it might be a good idea to clarify what languages their function can accept. I mentioned this only because I saw some Korean characters in your raw input text, so I was wondering if those characters were analyzed too or if they were just ignored.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[x] Function Documentation: for all user-facing functions.
[x] Examples for all user-facing functions.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[ ] Badges for:
- [ ] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [ ] A repostatus.org badge,
- [ ] Python versions supported,
- [ ] Current package version (on PyPI / Conda).

[x] Short description of package goals.
[x] Package installation instructions
[x] Any additional setup required to use the package (authentication tokens, etc.)
[ ] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [ ] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[x] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information

Usability

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

1 Hour

Review Comments

Great job guys! The function is well documented and can be extremely useful if someone wants to do some quick text analysis.
I believe you could explain a bit more on how your function compares to sPacy and VADER. Does it combine functionalites of those two packages or are you buildng on exisiting functionalites of the packages. This may help users decide which of the 3 packages between pysentimentanalyzer, sPacy and VADER will best serve their purpose.
The function sentiment_score_plot returns 2 plots but the example shows that only 1 plot should be returned. Looks like they both are the same plot though. Here is an image of out output that I get
The generate_wordcloud function is amazing! However right now, when I run the function, it itself does not give any output. To actually get the word cloud, I have to write another piece of code to index the list generated by the function. It may be useful to generate an output using a print() statetment and tell them that to actully get the word cloud they need to run another piece of code.
Adding to my previous comment, it can also be benefecial to add a title to each of the wordclouds so that after indexing the list generated by the function generate_wordcloud , the user knows for which sentiment the word cloud has been generated (positive, negative or neutral).
The names and GH usernames/emails of the contributors/authors can be added to the Readme file itself so that it is easier for users to find contact information of the authors in case they need help or wish to reach out for any other reason.
A final suggestion would be to try to achieve a code coverage of 100% by adding tests that check for all the lines in the code.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[x] Function Documentation: for all user-facing functions.
[x] Examples for all user-facing functions.
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[ ] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[ ] Badges for:
- [x] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [ ] A repostatus.org badge,
- [ ] Python versions supported,
- [ ] Current package version (on PyPI / Conda).

[x] Short description of package goals.
[x] Package installation instructions
[x] (NA)Any additional setup required to use the package (authentication tokens, etc.)
[ ] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[ ] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information

Usability

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [ ] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Review Comments

Can't find a link to the documentation website for the life of me. PR 48 mentions that online documentation exists. Would be nice to add the link to the README or the github description.
As such, documentation is not readily available despite functions having complete docstrings in the code.
Author/Mainatiner email not included in README or project.toml
milestone issues should be closed / resolved
may be a good idea to include function docstrings for test functions
helper function files are typically not best practices, seems like these functions could be included in the file that deals with getting the sentiment

UBC-MDS / software-review-2023