aws / sagemaker-scikit-learn-extension

A library of additional estimators and SageMaker tools based on scikit-learn
Apache License 2.0
39 stars 33 forks source link

feature: adds ThresholdOrdinalEncoder to preprocessors #26

Closed ipanepen closed 4 years ago

ipanepen commented 4 years ago

ThresholdOrdinalEncoder is based on sklearn.preprocessing.OrdinalEncoder. This encoder only encodes categories whose frequencies are above the given threshold and only encodes up to max_categories.

Issue #, if available:

Description of changes:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

Instead of creating a new estimator, let's adds the threshold and max_categories feature and extending the existing RobustOrdinalEncoder transformer. There's a lot of overlap and repeated code, logic, and documentation. We can also set the default threshold to 1 and max_categories to None (infinite categories) to keep the same default behavior as now.

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository