Use our own implementation for one-hot encoding

alteryx / evalml

EvalML is an AutoML library written in python.

BSD 3-Clause "New" or "Revised" License

784 stars 87 forks source link

Pros of rolling our own:

1936 : rather than dropping features after they're generated by sklearn, if we avoid computing them in the first place, we'll save time and complexity
Avoid the possibility of future cryptic bugs due to a mismatch in ordering or index between sklearn's return and what our code thinks it returns

Cons:

The sklearn OHE API and behavior is good, heh, and has been vetted by the community.

An alterative @chukarsten pointed out: we could propose an enhancement to the sklearn OHE to make per-column behavior easier.

alteryx / evalml