Open AraiYuno opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2 hours
Well done team! Snapedautility is a valuable addition to python graphics stack. It's main functions are easy to understand, useful, and the whole work is very natural and straightforward. I had almost no issues using it with my data, and everything works as promised.
Here are my general comments for improvement:
1 - I am positive that a lot of thought has gone into selecting this name for the package, but it took me a while to figure out how to read it. I understand that it is a combination of snap + eda + utility, but seeing from a user's lense typing snapedautility
has a high potential for error. I am sure you want your users to quickly remember and type in the package name while importing the package and suggest using something that is shorter, uses hyphen or even you could go bold if all names were already taken and pick a name that is not directly related to your functions. For example, Pandas has nothing to do with pandas but everyone knows what they do!
2 - Although names and Github usernames were provided. I was not able to find your email addresses in order to enable contact. It would be nice to add it.
3 - In the "Features" section in README you mention that detect_outliers
generates a violin plot that indicates the outliers that deviate from other observations on data. The example in ReadTheDocs and also your script suggests that this should be a boxplot. This should be an easy fix in the text. A general suggestion is to create a table where you would have one column with function names, one with the arguments each take and the last one a short description of what each function does. This could be a mini-map for your package.
4 - What I noticed in your ci-cd.yml
is that you are not testing whether your package can be installed on Windows or mac machines. It will be necessary to add this to your Continuous Improvement section to make sure your software is compatible with every machine. Note: you don't need to run on different machines for the CD section.
5 - I receive many warnings when installing the package. I am not sure if these could be silenced. I had over 10 pairs of the below warnings.
WARNING: Ignoring invalid distribution -ywin32 (c:\users\artan\miniconda3\envs\pycounts\lib\site-packages)
WARNING: Ignoring invalid distribution -yrsistent (c:\users\artan\miniconda3\envs\pycounts\lib\site-packages)
6 - In your "Usage" section you are creating a dataframe of penguins_data, but the user is not able to replicate this. This assumes users familiarity with palmerpenguins. I recommend adding the script on how to install (from palmerpenguins import load_penguins
) and then import the dataframe (something like the one you have done in your test scripts).
7 - In CONTRIBUTING.md
under "Get Started" you are cloning from a repo that does not exist. You will need to replace it with git clone https://github.com/UBC-MDS/snapedautility.git
.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5
Well done team! This package is lightweight, functional and focused. the source is easy to understand and the functionality is straightforward but of great utility. Nice Work! between your excellent work and Artan's meticulous review above, it started to get hard to find flaws, but i do have a few improvements to suggest:
Overall I consider these issues minor and would like to congratulate you again on a well executed development project.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5
Great job team! The main functions of snapedautility
are easy e to understand and it was easy to install. The project has performed very well overall, but I do have a few suggestions for improvement :
1- In the README.md
the doc badge link is not working and it just shows the picture of the badge. However, it needs to direct the user to the files deployed by readthedocs.
2- As your package is an EDA package, that would be better if you have added a few images from the output of your function in the usage
section of README.md
file so users would be able to see your output.
3- The name detect_outlier
function does exactly represent what this function doing. In the documentation README.md
you are mentioning It generates a violin plot; however, it generates a boxplot. And also this function is not detecting anything it is plotting so the name of the function is kind of misleading.
4- In the detect_outlier
documentation, the lower bound based on the plot is more than 2000, but on top of the chart 1750 has been calculated as the lower band. Therefore, the plot and the numbers do not match that need to be fixed.
5-In the documentation
the badges like ci-cd are not shown on the main page.
6- In the Usage
section of README.md
file, you are creating a dataframe of penguins_data, but this is not reproducible because you have not mentioned the following line of code to call the needed function for creating that dataframe "from palmerpenguins import load_penguins".
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5
Amazing job team. The package works as intended and described. I like how the name conveys the benefit of being able to generate plots swiftly. My fellow reviewers above have provided detailed feedback which I agree with. Additionally, I noticed a few minor edits as listed below:
In the README file, the badge for docs passing
appears in square brackets [ ]. I believe this was no intended, should be an easy fix.
I noticed that the scripts import the numpy
library, however, I do not see it being referenced anywhere in the code which makes sense as the plotting functions rely primarily on altair. I see this as unnecessary and suggest its removal.
The tests are very comprehensive and do a good job of testing required areas. I did notice in the test_plot_histograms.py
script that a couple of functions do not have comments/docstrings while others do. I would suggest including brief docstrings to help understand the functions.
The USAGE
section on the README file directly dives into the code blocks. I believe it will be useful to consolidate the Features and Usage sections such that the feature description is followed by the corresponding usage code block for better readability.
The examples in the function docstrings do not work as intended as we are not loading the sample penguins dataset.
Overall great effort team and kudos for 100% code coverage!
name: snapedautility about: Python package for peer review title: 'snapedautility' labels: 1/editor-checks, New Submission! assignees: ''
Submitting Author: Name (@AraiYuno)
Package Name: snapedautility One-Line Description of Package: snapedautility is an open-source library that generate useful function to kickstart EDA (Exploratory Data Analysis) with just a few lines of code. Repository Link: https://github.com/UBC-MDS/snapedautility Version submitted: v2.0.0 Editors
Reviewers
Archive: TBD
Version accepted: v2.0.0
Description
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
Explain how the and why the package falls under these categories (briefly, 1-2 sentences): This package offers utility functions that provide the basic EDA visualizations including histograms, outlier detection, and correlation plot.
Who is the target audience and what are the scientific applications of this package?
Any data professionals at the entry-level who would like to conduct a quick exploratory data analysis.
Are there other Python packages that accomplish the same thing? If so, how does yours differ? Python packages such as
pandasprofiling
,sweetviz
, andExploriPy
provide a more broad range of utility functions to conduct data extraction, transformation, and modeling in addition to exploratory data analysis. snapedautility aims to only concentrate on the EDA by keeping the package lightweight and less risky dependency conflicts.If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here