aws / sagemaker-scikit-learn-extension

A library of additional estimators and SageMaker tools based on scikit-learn
Apache License 2.0
39 stars 33 forks source link

Add weight of evidence encoder #25

Closed tlienart closed 4 years ago

tlienart commented 4 years ago

Notes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

cc @jeanfad

Edits

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

tlienart commented 4 years ago

Thanks a lot for the review, I'll fix everything I can asap

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

tlienart commented 4 years ago

Right I think I addressed all your comments & extended the spirit of some (e.g. removed all attributes unused in transform). Pandas is removed.

The only remaining point as mentioned in the comments is that you seem to want the check_estimator to be applied to WoE, I may be missing something but the check_estimator uses boston which has three classes and therefore will always fail with WoE. My suggestion is to ignore it and, if there is concern that we may be missing something, suggest the explicit addition of extra tests for WoE that are related to dimensions etc.

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

tlienart commented 4 years ago

gentle bump, thanks!

wiltonwu commented 4 years ago

Right I think I addressed all your comments & extended the spirit of some (e.g. removed all attributes unused in transform). Pandas is removed.

The only remaining point as mentioned in the comments is that you seem to want the check_estimator to be applied to WoE, I may be missing something but the check_estimator uses boston which has three classes and therefore will always fail with WoE. My suggestion is to ignore it and, if there is concern that we may be missing something, suggest the explicit addition of extra tests for WoE that are related to dimensions etc.

I think the build logs still show some flake8 and black linting tests failing. Those should be straightforward to fix, please let me know if you need help.

For the check_estimator test, I think you can add a tag for your new estimator to mark it as only compatible with binary classification datasets. It involves overriding the _get_tags() or _more_tags() method. We do something similar for RobustImputer. See https://scikit-learn.org/stable/developers/develop.html#estimator-tags. check_estimator should know which specific tests to run based on those tags (it might even skip the entire test). Also if you haven't seen that page, I'd recommend reading through it--it's a good guideline for writing scikit-learn compatible estimators.

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

tlienart commented 4 years ago
wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository