Open bbaillie opened 4 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Estimated hours spent reviewing: 1.5
Awesome implementation of state-of-the-art techniques in package form (and attribution to the relevant research on the README)
python-semantic-release
should be added to the dependencies listed on the README
python-semantic-release
after an initial error, but after doing so installation completed with no issuesVignette related observations on the README: It was not super clear whether the subsections below the 'Vignette' header (ex. 'Target Encoding') were meant to be function descriptions or vignette examples or both. Assuming those subsections are meant to be vignette content:
Function code was well formatted and clear, but could use some inline comments to enhance readability
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Estimated hours spent reviewing: 1.5 hours
flake8
(maybe add to your dependencies list in the README?)flake8
, I received this error: ERROR: No matching distribution found for colorama==0.4.1 (from ndebug~=0.1->python-semantic-release<5.0.0,>=4.10.0->encoderpy)
. I have colorama==0.4.1 installed, so I'm not sure where this issue is coming from. I noticed that colorama
is not in your pyproject.toml, so it's possible this is causing the issue. Please let me know if you've run into this error and have recommendations for how to address it.X_test
argument has a different description in every function.
onehot_encoder
is the only function where X_test
is not an optional argument. Unless there's a good reason for this, I'd make this consistent with the other functions.test_output()
). I could be wrong, but I don't believe you need to do this as poetry and pytest will run all the functions in the test file when doing it's checks. For example, here are some other test file examples:
Let me know if you have thoughts on how to fix the installation, and then I can check the rest of the boxes and approve the package!
@alistair-clark
Hey, thanks for the review.
I am not too sure on these issues, particularly with colorama (I don't recall ever using this package directly too be honest).
We forgot to mention in our README, as @brendoncampbell pointed out, that one needs to have semantic-release installed. Have you run this?
pip install python-semantic-release
Otherwise, I am unsure what is causing these issues. I am able to install the package on my end after installing semantic-release.
@alistair-clark
Hey, thanks for the review.
I am not too sure on these issues, particularly with colorama (I don't recall ever using this package directly too be honest).
We forgot to mention in our README, as @brendoncampbell pointed out, that one needs to have semantic-release installed. Have you run this?
pip install python-semantic-release
Otherwise, I am unsure what is causing these issues. I am able to install the package on my end after installing semantic-release.
That fixed it! I was able to install the package after running the above. It's strange because I already had python-semantic-release
installed, so it must have been something to do with dependencies. I noticed that running the above uninstalled colorama 0.4.3
and then reinstalled colorama 0.4.1
I'll update the checkboxes above now that it works.
@brendoncampbell
Thanks for the feedback.
Addressed for the latest release:
Did not address in this release:
@alistair-clark
As previously noted, thanks for the feedback as well.
Addressed for the latest release:
Did not address in this release:
Thanks to both of you for the great feedback.
All of the addressed feedback, as discussed above, can be viewed in our latest release, v1.2.1.
Thanks for following up. Great job with the package. I hope people find it and use it!
Submitting Author: Team Maryam Mirzakhani (L02 Group 17: @bbaillie @braydentang1 @robilizando @fsywang )
Package Name: encoderPy One-Line Description of Package: Creates numeric encodings for categorical variables Repository Link: https://github.com/UBC-MDS/encoderPy Version submitted: v1.2.0 Editor: Varada Kolhatkar (@kvarada)
Reviewer 1: Alistair Clark (@alistair-clark ) Reviewer 2: Brendon Campbell (@brendoncampbell )
Archive: TBD
Version accepted: TBD
Description
This package seeks to provide a convenient set of functions that allow for the encoding of categorical features in potentially more informative ways when compared to other, more standard methods. The user will feed as input a training and testing dataset with categorical features, and the resulting data frames returned will be preprocessed with a specific encoding of the categorical features. At a high level, this package automates the preprocessing of categorical features in ways that exploit particular correlations between the different categories and the data without increasing the dimension of the dataset, like in one hot encoding. Thus, through the more deliberate handling of these categorical features, higher model performance can possibly be achieved.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
This package preprocesses categorical features by encoding them in novel ways for better modelling performance.
The target audience of this package are people who practice predictive modeling. Any data scientist or researcher who is focused on supervised or unsupervised learning tasks will find this package useful, especially if their dataset contains a large amount of categorical features.
There is one notable package in Python that has a variety of different methods for more informative encodings of categorical features, aptly named Category Encoders. However,
Category Encoders
does not include a frequency encoder or a conjugate-prior encoder. These two encoders are inherently useful since frequency encoding has become relatively popular in the past couple of years, especially in Kaggle competitions and conjugate encoding is a new, state of the art methodology that has been shown to work well on many datasets. Furthermore, this package fully supports Pandas dataframes and will not drop column names, which eliminates any ambiguity in what each column represents with respect to the original columns/features.Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
No.
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
Editor and Review Templates
Editor and review templates can be found here