UBC-MDS / software-review-2023

DSCI 524
0 stars 0 forks source link

Group-10-Sanityze #6

Open xXJohamXx opened 1 year ago

xXJohamXx commented 1 year ago

Submitting Author: Name @tzoght All current maintainers: ( @tzoght, @xXJohamXx, @caesarw0) Package Name: Sanityze One-Line Description of Package: This package provides utilities to spot and redact PII from Pandas data frames. Repository Link: https://github.com/UBC-MDS/sanityze Version submitted: 0.1.3 Editor: @fdandrea Reviewer 1: Chenyang Wang Reviewer 2: Markus Nam
Reviewer 3: Marian Agyby Reviewer 4: Chen Li


Description

Scope

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

markusnam commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2.5 hours


Review Comments

The design of the package is quite robust as it can support different types of spotter. i.e. future enhancement can be easily achieved. Nonetheless, there are a few observations from me:

  1. There is a simple quick start example in the README which is good. It would be better by providing a concrete example to show a dataframe's content before clean and after clean. I could only find similar information in the docstring of the functions.
  2. For the user-facing functions, it would be clearer if the input parameter(s), the returned value(s) and the corresponding type(s) can be stated in documentations.
  3. It would be better if flake8 can be included in ci so as to have an automatic style check.
  4. A typo in the simple quick start example cleaner.add_spotter(...) - should be cleanser (not cleaner)
  5. I think the function getUID() has been renamed to getSpotterUID(). So the one in README is not up-to-date.
wakesyracuse7 commented 1 year ago
          ## Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2.5 hours


Review Comments

tzoght commented 1 year ago
  1. There is a simple quick start example in the README which is good. It would be better by providing a concrete example to show a dataframe's content before clean and after clean. I could only find similar information in the docstring of the functions.

Good suggestion, we will look into it.

tzoght commented 1 year ago

Regarding the following two badges, I believe we have at the top of the README.md

A repostatus.org badge, Python versions supported

markusnam commented 1 year ago

Agreed. My apologies. It's overlooked by me. Updated the review.

Regarding the following two badges, I believe we have at the top of the README.md

marianagyby commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2


Review Comments

Overall this is a very useful and interesting package, great job! Here are a few pointers:

  1. The structure of the sanityze repository is very well organized and well documented. The ReadMe is especially informative and does a great job of introducing what the package does, why it is useful, and how to use it.

  2. In your quickstart example in the ReadMe.md, you have the following line:

from sanityze import Cleanser, EmailSpotter

However this leads to an import error, as other reviewers have mentioned. Based on your script file names in src/sanityze, I believe the import instructions should be as follows:

from sanityze.cleanser import Cleanser
from sanityze.spotters import EmailSpotter

This way I have been able to successfully import the classes and functions.

  1. It could also be useful for your quickstart example on the readme to have an example dataframe populated with data for the user to easily test out how the function works. However, the complete examples in the linked documentation are very clear and helpful.

  2. The code is organized into scripts in a logical structure, well-documented, and easy to read.

  3. You could add a codecov badge to show the percentage of code that is covered by your automated tests.