alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
784 stars 87 forks source link

Use our own implementation for one-hot encoding #1995

Open angela97lin opened 3 years ago

angela97lin commented 3 years ago

Following discussion from #1936 and #830, we've had to work around our the scikit-learn implementation of one-hot encoding to add our own functionality. #1936 in particular works to add the ability to drop an encoded feature that only has two categories, but has to work around scikit-learn's implementation limitations.

Rolling our own implementation can help increase performance and avoid extra convoluted logic we've added to work out scikit-learn's implementation :)

dsherry commented 3 years ago

Pros of rolling our own:

Cons:

An alterative @chukarsten pointed out: we could propose an enhancement to the sklearn OHE to make per-column behavior easier.