dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.56k stars 473 forks source link

Handle/Imputing Missing Entries #415

Closed TianyiPeng closed 1 year ago

TianyiPeng commented 2 years ago

Feature request

In a lot of scenarios, the input table has missing values. Seems that the current algorithm cannot handle those missing values directly.

The tabnet pre-training process should be able to handle the missing data (with a mask as an input) or be used to impute the missing data after the training. Any suggestion for this issue?

Optimox commented 2 years ago

Imputing missing values is a real challenge.

Masking an entry with attention is equivalent to fillna = 0, so I guess this gives you a simple fillna method, eventhough there is probably something smarter to do.

In the end I think it would be nice to accept missing values inside tabnet but the proposed method would probably be suboptimal so it's always better to think manually what to do with missing values.