UBC-MDS / software-review-2022

0 stars 0 forks source link

Submission Group 15 - snapedautility(Python) #48

Open AraiYuno opened 2 years ago

AraiYuno commented 2 years ago

name: snapedautility about: Python package for peer review title: 'snapedautility' labels: 1/editor-checks, New Submission! assignees: ''


Submitting Author: Name (@AraiYuno)
Package Name: snapedautility One-Line Description of Package: snapedautility is an open-source library that generate useful function to kickstart EDA (Exploratory Data Analysis) with just a few lines of code. Repository Link: https://github.com/UBC-MDS/snapedautility Version submitted: v2.0.0 Editors

Reviewers

Archive: TBD
Version accepted: v2.0.0


Description

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

artanzand commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2 hours


Review Comments

Well done team! Snapedautility is a valuable addition to python graphics stack. It's main functions are easy to understand, useful, and the whole work is very natural and straightforward. I had almost no issues using it with my data, and everything works as promised.

Here are my general comments for improvement:

1 - I am positive that a lot of thought has gone into selecting this name for the package, but it took me a while to figure out how to read it. I understand that it is a combination of snap + eda + utility, but seeing from a user's lense typing snapedautility has a high potential for error. I am sure you want your users to quickly remember and type in the package name while importing the package and suggest using something that is shorter, uses hyphen or even you could go bold if all names were already taken and pick a name that is not directly related to your functions. For example, Pandas has nothing to do with pandas but everyone knows what they do!

2 - Although names and Github usernames were provided. I was not able to find your email addresses in order to enable contact. It would be nice to add it.

3 - In the "Features" section in README you mention that detect_outliers generates a violin plot that indicates the outliers that deviate from other observations on data. The example in ReadTheDocs and also your script suggests that this should be a boxplot. This should be an easy fix in the text. A general suggestion is to create a table where you would have one column with function names, one with the arguments each take and the last one a short description of what each function does. This could be a mini-map for your package.

4 - What I noticed in your ci-cd.yml is that you are not testing whether your package can be installed on Windows or mac machines. It will be necessary to add this to your Continuous Improvement section to make sure your software is compatible with every machine. Note: you don't need to run on different machines for the CD section.

5 - I receive many warnings when installing the package. I am not sure if these could be silenced. I had over 10 pairs of the below warnings.

WARNING: Ignoring invalid distribution -ywin32 (c:\users\artan\miniconda3\envs\pycounts\lib\site-packages)
WARNING: Ignoring invalid distribution -yrsistent (c:\users\artan\miniconda3\envs\pycounts\lib\site-packages)

6 - In your "Usage" section you are creating a dataframe of penguins_data, but the user is not able to replicate this. This assumes users familiarity with palmerpenguins. I recommend adding the script on how to install (from palmerpenguins import load_penguins) and then import the dataframe (something like the one you have done in your test scripts).

7 - In CONTRIBUTING.md under "Get Started" you are cloning from a repo that does not exist. You will need to replace it with git clone https://github.com/UBC-MDS/snapedautility.git.

gfairbro commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5


Review Comments

Well done team! This package is lightweight, functional and focused. the source is easy to understand and the functionality is straightforward but of great utility. Nice Work! between your excellent work and Artan's meticulous review above, it started to get hard to find flaws, but i do have a few improvements to suggest:

  1. There is no detailed citation of other packages you have used in your utility such as altair and pandas. It might be considerate to credit those packages.
  2. It you could consider including the inputted feature names in the title of your plots so that the titles are less generic.
  3. Same idea for axis labels, you could think about applying title case and string replacement on the inputs (it wouldnt be perfect of course but could be pretty flexible).
  4. the documentation of the detect outliers indicates a violin plot, but a box plot has been generated.
  5. The examples from the docstrings don't really work as written. plot corr is not quoting the right names from penguins, plot histograms makes a slightly vague reference to the palmer penguins data, and the detect outliers could better demonstrate how to capture it's output. The example.ipynb is excellent however so it is a minor complaint.

Overall I consider these issues minor and would like to congratulate you again on a well executed development project.

rezam747 commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5


Review Comments

Great job team! The main functions of snapedautility are easy e to understand and it was easy to install. The project has performed very well overall, but I do have a few suggestions for improvement :

1- In the README.md the doc badge link is not working and it just shows the picture of the badge. However, it needs to direct the user to the files deployed by readthedocs.

2- As your package is an EDA package, that would be better if you have added a few images from the output of your function in the usage section of README.md file so users would be able to see your output.

3- The name detect_outlier function does exactly represent what this function doing. In the documentation README.md you are mentioning It generates a violin plot; however, it generates a boxplot. And also this function is not detecting anything it is plotting so the name of the function is kind of misleading.

4- In the detect_outlier documentation, the lower bound based on the plot is more than 2000, but on top of the chart 1750 has been calculated as the lower band. Therefore, the plot and the numbers do not match that need to be fixed.

5-In the documentation the badges like ci-cd are not shown on the main page.

6- In the Usage section of README.md file, you are creating a dataframe of penguins_data, but this is not reproducible because you have not mentioned the following line of code to call the needed function for creating that dataframe "from palmerpenguins import load_penguins".

iamMoid commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5


Review Comments

Amazing job team. The package works as intended and described. I like how the name conveys the benefit of being able to generate plots swiftly. My fellow reviewers above have provided detailed feedback which I agree with. Additionally, I noticed a few minor edits as listed below:

  1. In the README file, the badge for docs passing appears in square brackets [ ]. I believe this was no intended, should be an easy fix.

  2. I noticed that the scripts import the numpy library, however, I do not see it being referenced anywhere in the code which makes sense as the plotting functions rely primarily on altair. I see this as unnecessary and suggest its removal.

  3. The tests are very comprehensive and do a good job of testing required areas. I did notice in the test_plot_histograms.py script that a couple of functions do not have comments/docstrings while others do. I would suggest including brief docstrings to help understand the functions.

  4. The USAGE section on the README file directly dives into the code blocks. I believe it will be useful to consolidate the Features and Usage sections such that the feature description is followed by the corresponding usage code block for better readability.

  5. The examples in the function docstrings do not work as intended as we are not loading the sample penguins dataset.

Overall great effort team and kudos for 100% code coverage!