Open wangjc640 opened 3 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5hrs
Hey Team!
Great job on this package. It looks like you put a lot of work in and I like how the package as whole has a very consistent theme. Here's a few comments from me below. It's divided into a functionality and documentation section.
Functionality:
Installation: I couldn't seem to install the package based on your installation instructions. I think you may have to update your installation instructions to this: $ pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple eda_utils_py
. I tested that command and it now works.
Testing: It's really great to see that you have 100% test coverage. I tested some edge cases such as entering the wrong data type, insuitable arguments, etc.. and everything I tested seems to work fine and return a helpful error message.
Plotting: I like the color scheme you used. It suits a correlation plot well and ensures that a zero correlation shows up neutrally.
Documentation:
It would have been nice to see all the authors of this package in the pyproject.toml
file. It currently only has Chuang Wang listed as the author.
In the usage section of the README, it's probably more user friendly to explicitly list all the arguments of each function. (ie. def imputer(df, strategy="mean", fill_value=None):
instead of imputer(data_with_NA)
. This helps users understand the the imputation function uses the mean strategy by default. It also may have been nice to list all the potential arguments that a user can use (ie. mean, median, etc...). I didn't realize the function was so comprehensive until I looked into the function parameters.
In your function documentation in eda_utils_py.py
it looks like you switch between calling the function parameters as df
and dataframe
(as one example). More consistency in naming convention between the functions would help your package feel more consistent. I've noticed this also in your code comments. Some areas are more closely documented than others.
A more in depth discussion on the ecosystem may help users better understand what exactly is the difference between this package and the existing ones and why they should use yours. I think you actually did this in the submission template and can probably copy and paste from there.
Let me know your thoughts and if anything needs clarification!
Heidi
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5 hours
Hi Team,
Great job on building the package. The hard work shows in putting up such a clean and consistent package in such a short amount of time. Below are some minor points of feedback that I hope help improve the package further.
The current installation link in the README gives an error. I tried the following link and that seemed to work: pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple eda_utils_py
Under the Usage section of the README, since most of the outputs were dataframe transformations, I found myself scrolling back and forth to the example dataframes to see what exactly had changed. It might be worth highlighting the change either through a visual or in a short explanation. The exact transformations became clear to me once I read the function documentation, which were written very well.
I was unable to find author contact details, you could add this in the Contributing section of the README by hyperlinking your names to your github accounts/and or email addresses.
The links in the CONTRIBUTING.rst are broken, I think you want to replace those links with this - https://github.com/UBC-MDS/eda_utils_py/issues
It would be good to add all authors in the pyproject.toml file, like you have in the License file.
I looked through the code and ran all the functions, which all worked as advertised. I thought the code was well written, with good docstrings and inline code comments. Testing was thorough, proven by the 100% code coverage. I couldn't find anything code-related to feedback on. Great job :)
Let me know if anything needs clarification.
Cheers, Nash
Package Info
Submitting Author:
Package Name: eda_utils_py One-Line Description of Package: Fast way of dealing with outlier and missing values, scaling, and correlation visualization. Repository Link: eda_utils_py Version submitted: 0.1.29 Editor: TBD Reviewer 1: TBD Reviewer 2: TBD Archive: TBD Version accepted: TBD
DESCRIPTION
As data rarely comes ready to be used and analyzed for machine learning right away, this package aims to help speed up the process of cleaning and doing initial exploratory data analysis (EDA). The package focuses on the tasks of dealing with outlier and missing values, scaling, and correlation visualization.
The four functions contained in this package are as follows:
imputer()
: A function to impute missing valuesoutlier_identifier()
: A function to identify and deal with outlierscor_map()
: A function to plot a correlation matrix of numeric columns in the dataframescale()
A function to scale numerical values in the datasetScope
*Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an obvious research application according to JOSS's definition in their [submission requirements][submit]. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][submit]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a paper.md matching [JOSS's requirements][jr] with a high-level description in the package root or in inst/. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here