alteryx / categorical_encoding

Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library
BSD 3-Clause "New" or "Revised" License
50 stars 15 forks source link

M-Estimate #3

Open alexjwang opened 5 years ago

alexjwang commented 5 years ago

Describe the encoding method below. Attach any relevant links that reference the encoding method. Very similar to Target Encoding--only difference is that it has only one tunable parameter (m) versus target encoder's two tunable parameters (min_samples_leaf and smoothing). https://contrib.scikit-learn.org/categorical-encoding/mestimate.html

Describe the encoder class method. Any additional functions aside from the essential fit(), transform(), and get_features()? For example, Hashing Encoder has get_hash_method(). Similar to Target Encoding.

Describe the encoder primitive for use with Featuretools. Should have a mapping to encode any values in the dataframe column into its appropriate weighted average.

Describe the use cases in which this encoder would be useful (what kinds of data, high-cardinality, etc.). Useful in high-cardinality data where one-hot encoding and other similar high-dimensionality resulting encoders do not work. Works in the same situations that Target Encoding does, but could be useful if Target's aforementioned parameters do not work for the situation.

Input type? [Categorical]

Output type? Numeric

List third party libraries required: category-encoders

Describe encoding method's behavior with train, test, and new data. Use train to learn the averages, test to validate the encoding and ML models, and new data will be encoded based off of the fitted encoder from the train data step.

Test cases. np.nan