automl / TabPFN

Official implementation of the TabPFN paper (https://arxiv.org/abs/2207.01848) and the tabpfn package.
http://priorlabs.ai
Apache License 2.0
1.22k stars 109 forks source link

remove_outliers sets imbalanced categorical features to constants #80

Closed amueller closed 7 months ago

amueller commented 9 months ago

I noticed that there's a lot of constant zero features in the data, and I was wondering why. Turns out that if a categorical feature is imbalanced enough, all the minorities are outliers, so the outlier removal makes the feature constant 0. That's probably not ideal.

SamuelGabriel commented 9 months ago

This is a very interesting finding! Thanks