Open ajschumacher opened 2 years ago
weight of evidence (WOE) is one of these too; see #288
in catboost paper https://arxiv.org/pdf/1706.09516.pdf
Further, there is a similar issue in standard algorithms of preprocessing categorical features. One of the most effective ways [6, 25] to use them in gradient boosting is converting categories to their target statistics. A target statistic is a simple statistical model itself, and it can also cause target leakage and a prediction shift.
[6] B. Cestnik et al. Estimating probabilities: a crucial task in machine learning. In ECAI, volume 90, pages 147–149, 1990.
[25] D. Micci-Barreca. A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems. ACM SIGKDD Explorations Newsletter, 3(1):27–32, 2001. http://helios.mm.di.uoa.gr/~rouvas/ssi/sigkdd/sigkdd.vol3.1/barreca.pdf
For a categorical feature with high cardinality (#category is large), it often works best to treat the feature as numeric, either by simply ignoring the categorical interpretation of the integers or by embedding the categories in a low-dimensional numeric space.
https://lightgbm.readthedocs.io/en/latest/Advanced-Topics.html
There seems to be no reason to use One-Hot Encoding over Numeric Encoding.
(one-hot not good for trees)
Here's a method they use for the xgboost paper https://arxiv.org/abs/1603.02754: