UBC-MDS / software-review-2022

0 stars 0 forks source link

Submission Group 24: numeric_edahelper(Python) #31

Open hjw0703 opened 2 years ago

hjw0703 commented 2 years ago

Submitting Author: Michelle Wang(@michelle-wms), Jiwei Hu (@hjw0703), Anupriya Srivastava (@Anupriya-Sri), Sam Quist(@squisty) Package Name: numeric_edahelper(Python) One-Line Description of Package: The numeric_edahelper package is a package that aims to streamline Exploratory Data Analysis, specifically for numeric data in datasets. Repository Link: https://github.com/UBC-MDS/numeric_edahelper Version submitted: v1.0.7 Editor: Michelle Wang(@michelle-wms), Jiwei Hu (@hjw0703), Anupriya Srivastava (@Anupriya-Sri), Sam Quist(@squisty) Reviewer 1: Ruben De la Garza Macias Reviewer 2: Jennifer Hoang Reviewer 3: Harry Chan Reviewer 4: Junrong Zhu


Description

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

The package includes functions which can complete the following tasks:

Display some useful statistics
Detect outliers
Deal with missing values
Check for correlation between features

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

ruben1dlg commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:

1 hours

Review Comments

I like your package a lot guys! Here are some comments:

  1. I would try to display the badges for both the docs and the codecov, just so it is easier for users to know the coverage and access the docs directly.
  2. I see the example code for the usage in the readme file, but you only show the code. I think it would be a good idea to show how each of the functions modifies the data.
  3. In your ReadTheDocs page, I see that you do not have an example usage, which kind of relates to the last point. It would be nice to see an example and how each one of the functions work.
  4. I see that you have some open issues that are over a week old. I think it would be a good idea to close them of they are finished, just to have a better visibility of what is pending and who is assigned to it, as well as the deadline for each one.
  5. I noticed that the name of some branches contain names of the collaborators. I would try to use more generic names like feat-something or fix-something, so that it is clearer for the rest of the collaborators.
harryyikhchan commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5 hrs


Review Comments

Great jobs, everyone! Automating EDA is always a strong need for every data scientist. Below is my feedback for future improvement.

  1. In your ReadtheDocs website, I think it would be more informative if you can include a good intro dataset such as penguins, iris datasets to demonstrate the functionality of all the package functions. So the users can relate to their own use cases before diving into any technical details.
  2. I think it would be better to structure your test case in one particular datatype / error for each unit function, e.g. test_simple_df, test_multiple_df_raise_error. So the contributors can easily include more edge cases in the test suite.
  3. It would be great if you can include more edge cases, such as different datatype, empty dataframe, a long list to see how the defensive programming mechanism reacts.
  4. As a developer, I would also like to see the code coverage report. It's a good idea to include a codecov badge in README.md
  5. I found that there is a missing example in the function overview(). It is suggested to include a simple example to guide the users.
jennifer-hoang commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching [JOSS's requirements](http://joss.theoj.org/about#paper_structure) with:

Final approval (post-review)

Estimated hours spent reviewing: 1 hr


Review Comments

Great work everyone! Please find a few suggestions for improvement below:

  1. As mentioned by other reviewers, it would be useful for new users to have some examples in the Read the Docs files showcasing how to use the function and the expected outputs.
  2. A code coverage badge or link to the test coverage report in the README would also be helpful.
  3. There are a few very minor grammatical errors in the README that could be fixed to improve readability (”alot”, “light-weighted”).
  4. In the documentation of the overview function, it would be beneficial to be more specific about what statistical values the function is able to calculate (mean, standard deviation, etc.), since a general user might not be able to immediately guess what values can be calculated.
  5. The Read The Docs copyright information (in the conf.py file) only mentions one author, but it would be nice to include or mention all authors since you’ve all worked hard on this package together!

Best, Jennifer

zjr-mds commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1 hour

Congratulations on creating this amazing package! Great work! All functions worked properly as expected from the repository description and I only had a few minor suggestions as below! Cheers!

  1. Edit the ‘About' description to make the repository look nicer. Maybe you can also add the link to read_the_doc under the description in ‘About’.
  2. Minor typo on readme: missing space in “alot ”
  3. Removing some old branches to avoid possible confusion; I noticed that you have over 10 branches but some of them are more than 40+ commits behind the main branch. In this case, it’d be okay to remove the ones that you don't use anymore.
  4. More tests can be added for overview function to check the sum of other columns from the output dataframe.
  5. Also, the example.ipynb is not updated to the most recent version like what’s shown from readthedoc, maybe you could take a look into that.

Review Comments