Open Jacq4nn opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5 hours
The README file looks good, but I think it would be better to be a bit more explicit about where to find the documentation online rather than just including the docs badge.
I think it would be nice to see the test coverage of your package by including the codecov badge.
I found the function perc_cap_words
interesting so I played around with it a bit and noticed some unexpected output (at least for me). Running perc_cap_words("W-O-R-L-D")
results in 20.0
and running perc_cap_words("WOR-LD")
results in 50.0
. I would have expected the output to be 100
for both cases. This could be because you split the text by whitespace when counting count_cap_words
but divided it by the total number of words counted by tokenizer.tokenize(text)
. It might be a better approach to tokenize the words using the tokenizer
and then count the number of words that are in all caps.
This could be up to design choice but when I run avg_word_len("it's me")
, the function returns 1.666
because punctuations are replaced by a space. In this case, however, I would expect the output to be 2.5
.
I think it might be nice to explicitly let users know what characters are considered punctuation maybe in the function documentation.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
Great work team! I like the package idea and the motivation you provided for it in the README. Here is what I liked about your project:
Here are a few suggestions I would like to add:
Good work, and keep going!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Reviewer note: Section not applicable for this package
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1 hour
Overall well done, this package definitely seems useful for NLP tasks and could assist with feature engineering.
Some comments:
You can use the functions as below:
"installation
code box in the README starts with $
(which is fine) but I noticed in the Usage
section this is different. Could update these sections to be consistent (either or is fine)All the above are not significant comments and you guys have done a great job so far!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1 hr
I find this package very easy to understand and use. It automates many tedious works in NLP, especially when I want to do certain feature engineerings. Here are my comments after testing your package:
remove_stop_words
function does not contain punctuations. It might be better to add an explanation of the output in the docstring or remove punctuations in the output.count_punc
. It returns EOL or decoding error when I input "\" combined with other things like "C:\Users" or "C:\".
Submitting Authors:
Package Name:
textfeatureinfo
One-Line Description of Package: Extract information from text features which can be useful for feature engineering, or in other data science projects Repository Link: textfeatureinfo Version submitted: 2.0.0 Editor: Florencia D'Andrea (@flor14)Reviewers:
Description
Our package, textfeatureinfo, will help gather summary information from plain text such as the number of punctuations in the text, the average word lengths and the percentage of fully capitalised words which can be useful information for feature engineering. Additionally, our package can also manipulate text data by removing the stopwords for the ease of future processing steps.
count_punc
: This function will count and return the number of punctuations within a given text.avg_word_len
: This function will calculate and return the average length of words within a given text.perc_cap_words
: This function will calculate the percentage of fully capitalised words in the text.remove_stop_words
: This function will find and remove the stop words in a text and will return the list of clean words.Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's `[submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements)`: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here