Closed tlienart closed 4 years ago
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Thanks a lot for the review, I'll fix everything I can asap
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Right I think I addressed all your comments & extended the spirit of some (e.g. removed all attributes unused in transform). Pandas is removed.
The only remaining point as mentioned in the comments is that you seem to want the check_estimator
to be applied to WoE, I may be missing something but the check_estimator uses boston which has three classes and therefore will always fail with WoE. My suggestion is to ignore it and, if there is concern that we may be missing something, suggest the explicit addition of extra tests for WoE that are related to dimensions etc.
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
gentle bump, thanks!
Right I think I addressed all your comments & extended the spirit of some (e.g. removed all attributes unused in transform). Pandas is removed.
The only remaining point as mentioned in the comments is that you seem to want the
check_estimator
to be applied to WoE, I may be missing something but the check_estimator uses boston which has three classes and therefore will always fail with WoE. My suggestion is to ignore it and, if there is concern that we may be missing something, suggest the explicit addition of extra tests for WoE that are related to dimensions etc.
I think the build logs still show some flake8 and black linting tests failing. Those should be straightforward to fix, please let me know if you need help.
For the check_estimator
test, I think you can add a tag for your new estimator to mark it as only compatible with binary classification datasets. It involves overriding the _get_tags()
or _more_tags()
method. We do something similar for RobustImputer. See https://scikit-learn.org/stable/developers/develop.html#estimator-tags. check_estimator
should know which specific tests to run based on those tags (it might even skip the entire test). Also if you haven't seen that page, I'd recommend reading through it--it's a good guideline for writing scikit-learn compatible estimators.
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
'X_types': ['categorical']
because it tests by default with continuous values and this causes issues in the later checks at the transform stage (in testing whether what gets encoded has new categories); I did fix a few minor things based on the earlier tests though.Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
WOEEncoder
to the encoders described here https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.htmlX
ory
).Notes:
titanic.csv
dataset for testing, the testing is done against sklearn-contrib (https://contrib.scikit-learn.org/category_encoders/woe.html)check_estimator
which tries to fit the estimator with Boston (3 classes). Please advise.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
cc @jeanfad
Edits
iter
, pylint doesn't like that.