alteryx / categorical_encoding

Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library
BSD 3-Clause "New" or "Revised" License
50 stars 15 forks source link

Weights of Evidence #4

Open alexjwang opened 5 years ago

alexjwang commented 5 years ago

Describe the encoding method below. Attach any relevant links that reference the encoding method. Weights of Evidence (WoE) tells the predictive power of an independent variable in relation to the dependent variable through the formula: $$\text{WoE} = \ln{\frac{\text{Distribution of non-events}}{\text{Distribution of events}}}.$$

WOE is especially useful in certain cases because similar WOE's imply similar categories, which could help with the accuracy of a machine learning algorithm.

Read more about WoE here.

Describe the encoder class method. Any additional functions aside from the essential fit(), transform(), and get_features()? None for now. May need additional functions in order to integrate with feature calculation.

Describe the encoder primitive for use with Featuretools. Passes mapping to encoder primitive, which then encodes the column of categoricals.

Describe the use cases in which this encoder would be useful (what kinds of data, high-cardinality, etc.). Was originally created for use in credit fraud detection. Particularly good for binary situations ("good" and "bad" statuses).

Input type? possibly sigma, regularization

Output type? Numeric

List third party libraries required: category encoders

Describe encoding method's behavior with train, test, and new data. Similar to other Bayesian encoders. Fit on train, transform with learned mappings on test and new data.

Test cases. np.nan