Open scao1 opened 4 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 4
ValueError
for improper values or TypeError
for improper types. Raising exceptions with the proper error type helps a user identify their error.asserts
in each of the functions were checking the same thing. To clean up your code for readability and efficiency put some of these checks into helper functions. The helper functions could then be located in a helper sub-directory in the pyedahelper directory.seaborn
, matplotlib
and altair
. seaborn
and matplotlib
work in tandem meaning there were really 2 libraries used (altair
and matplotlib
) where one could have been sufficient. Consider using either matplotlib
or altair
. This will cut down on the dependencies, and will unify your functions.fast_corr
and fast_outlier_id
functions have some miss-formatting on the "parameters" sections.plot
or whatever function is needed).altiar
plotting function could include a wrapper function to call each of the three plots. Additionally, instead of initializing many variable, consider using a data structure like a dictionary to hold the variables (i.e. in fast_outlier_id
).The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 3
The concept was really useful. I especially like the finding and handling of outliers and nans. I can see this being really helpful and saving time.
Installation was smooth, you use a lot of new packages that I had to update, but it was outlined clearly in the dependencies.
You clearly laid out what was the use and how to use your functions. The example on your README was really helpful. Your tests were mostly clearly documented and covered edge cases. I used a lot of random inputs and most were successfully defended against. See "Feedback Suggestions" below for improvements.
All your buttons passed including the build, codecov with 96%, release and docs.
The relation to other packages on your README was really well done.
Feedback Suggestions:
On your README you state: "Data understanding and cleaning represents 60% of data scientist's time in any given project.". It would be great if you can cite this as it is a powerful statement why your package is needed.
Spell check the README because there are some minor grammar mistakes:
There is random spaces around a comma: The goal with this package is to simplify this process , and make a more efficient
. There were a few commas used instead of periods at the end of sentences.
Dataframe is referred as data frame
and dataframe
which is inconsistent.
analyzed is spelled "anlyzed" in the Functions section.
Most error messages I got were helpful like throwing an error when not putting in a dataframe. An error message that could use improvement is the on for the fast_missing_impute
function. I put in a random method (pyedahelper.fast_missing_impute(df=sample_data, method="none", cols=["col_a, col_b"]
) and got AssertionError: Not a valid method!
. The error does tell me what is wrong, but I wasn't sure what to put in. I think having an error message like Not a valid method! Change method to "mean", "median" or "mode"
or the other methods you accept. You do this for the fast_plot function for plot_type when it is not one of your accepted types.
The test function test_response_fast_outliers_id
for in the test_fast_outlier_id.py is missing a doc string or any documentation.
Your fast_corr
function with one column name produced an appropriate error message, however, testing it with one numeric and one non-numeric column produced a blank plot. You should include an additional test if only a numeric and non-numeric column name is inputted.
The fast_plot function should have a title in the output plot. You should be able to build one using f-string usage (f"{plot type}"). You can read more on Real Python
The Read the Docs documentation for the fast_corr
and fast_outlier_id
function are not being rendered correctly from your doc strings. The fast_corr
may be due to using "Arguments" instead of "Parameters". It renders fine using ?pyedahelper.fast_corr
in Python.
Hi @jnederlo , thank you very much for your review. We have addressed the fourth point in your suggestion. Please free free to track the changes at https://github.com/UBC-MDS/pyedahelper/pull/73. You can find the new release of our package here. Thank you for helping us improve our package!
Hi @sweber15 , thank you very much for your review. We have addressed the first three points in your suggestion. Please free free to track the changes at https://github.com/UBC-MDS/pyedahelper/pull/72. You can find the new release of our package here. Thank you for helping us improve our package!
Submitting Author: Ofer Mansour(@ofer-m), Suvarna Moharir(@suvarna-m), Subing Cao(@scao1 ), Manuel Maldonado (@manu2856 ) Package Name: pyedahelper One-Line Description of Package: A Python package that simplifies up the main EDA procedures such as: outlier identification, data visualization, correlation, missing data imputation. Repository Link: https://github.com/UBC-MDS/pyedahelper Version submitted: 0.1.13 Editor: Varada Kolhatkar (@kvarada ) Reviewer 1: Sarah Weber(@sweber15) Reviewer 2: Jarvis Nederlof (@jnederlo) Archive: TBD
Version accepted: TBD
Description
Data understanding and cleaning represents 60% of data scientist's time in any given project. The goal with this package is to simplify this process , and make a more efficient use of time while working on some of the main procedures done in EDA (outlier identification, data visualization, correlation, missing data imputation).
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
The goal with this package is to simplify the process of data understanding and cleaning , and make a more efficient use of time while working on some of the main procedures done in EDA (outlier identification, data visualization, correlation, missing data imputation).
Our target audiences are those who want to have a quick understanding of their data by doing data cleaning and visualization. Our software provides an efficient and user-friendly solution for EDA analysis.
At this time, there are several packages that are used during EDA with a similar functionality in both Python. Nevertheless most of these existing packages require multiple steps or provide results that could be simplified. In the redahelper package, the focus is to minimize the code a user uses to generate significant conclusions in relation to: outliers, missing data treatment, data visualization, correlation computing and visualization.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here