UBC-MDS / software-review-2024

0 stars 0 forks source link

Group 10 - text_processing_util_mds24 #14

Open jy1909 opened 9 months ago

jy1909 commented 9 months ago

Submitting Author: Jerry Yu (@jy1909 ) All current maintainers: (@MoNorouzi23, @allan8392, @nassimgha) Package Name: text_processing_util_mds24 One-Line Description of Package: This package is designed for streamlined text processing tasks, including a function for noise removal and text refinement and 3 functions for text representation. Repository Link: https://github.com/UBC-MDS/text_processing_util_mds24 Version submitted: 2.0.0 Editor: @ttimbers Reviewer 1: Chris Gao Reviewer 2: Celeste Zhao Reviewer 3: Kiersten Gilberg Reviewer 4: Katherine Chen Archive: TBD JOSS DOI: TBD Version accepted: TBD Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

Empower your text analysis workflows with text processing package, a Python library designed for streamlined text processing tasks. This versatile package offers four key functions: text_clean for noise removal and text refinement, frequency_vectorizer to generate frequency-based vectors, tfidf_vectorizer for TF-IDF vectorization, and tokenizer_padding to assist in tokenization and padding of text sequences for usage in recurrent neural networks. By simplifying essential text preprocessing steps, this package facilitates efficient text-based analysis, providing an easy-to-use toolkit for natural language processing and text modeling endeavors.

Scope

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

The target audience is those who are looking to preprocess text before conducting machine learning on text data.

There are other Python packages that accomplish similar things. However, this package offers text representations for both traditional machine learning models and recurrent neural networks in the same library.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication Options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Confirm each of the following by checking the box.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

celestezhao commented 9 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Estimated hours spent reviewing: 1h


Review Comments

This is a very useful package for text processing in the text analysis workflows. I found it straightforward to install and integrate into my projects. Here are my thoughts for potential improvements:

Overall I find this package useful and easy to adopt. Great job folks!

chrisgqy commented 9 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 1.5

Review Comments

Kierst01 commented 9 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:

2

Review Comments

Here are a few thing I believe could be improved:

Overall, I think this project was well done and had a very interesting topic! Great job :)

czwcandy commented 9 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

  1. Overall, it is a very useful package which can help relevant work more easily.
  2. For ReadMe, It would be great to be more specific to explain the usefulness of such functions and give a difference between similar python build-in function, similarity of python build-in function.
  3. And also if add a button for converge can help this package more clearly show on Readme.
  4. Also, there is one test function cannot work well. Perhaps, there may exist some corner cases.
  5. During instruction in Readme, it is not very clear for me to follow the guidelines to set up.
  6. In a word, if can update Readme, this package would be great. But still it is a very good package. Great job guys! 👍