aws / sagemaker-scikit-learn-extension

A library of additional estimators and SageMaker tools based on scikit-learn
Apache License 2.0
39 stars 33 forks source link

Adding RobustOrdinalEncoder #8

Closed zkarnin closed 4 years ago

zkarnin commented 4 years ago

RobustOrdinalEncoder acts like an sklearn OrdinalEncoder, but does not throw an exception when encountering an unobserved value. Instead, it assigns it to the integer num_values. For example, if during fit we observe two values 'Cat', 'Dog', but in transform we observe 'Elephant' and 'Horse'. Both will be encoded as 2. In the inverse transform the mapping of unknown values is converted to None. Notice that this will cause the array type to be 'object'. This is consistent with the behaviour of sklearn's OneHotEncoder.

Issue #, if available:

Description of changes:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wiltonwu commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository