dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.61k stars 485 forks source link

Unsupervised training result for visualisation in UMAP or tSNE #414

Closed ensonario closed 2 years ago

ensonario commented 2 years ago

Feature request

What is the expected behavior?

After reasing the paper and trying the library it feels the model might be a good approach for the metric training and subsequent data visualistion using UMAP or tSNE. It also feels the masks used in TabNet can be treated as disambiguated features.

What is motivation or use case for adding/changing the behavior?

Better understanding of complex data and representing data as high-level features (ideally disambiguated). I'm trying to understand can it be a drop-in alternative to beta-vae or info-gan (which designed mostly for images)?

How should this be implemented in your opinion?

I suspect it should be available already, the question is how to extract the necessary weights from the model.

Are you willing to work on this yourself? yes

Optimox commented 2 years ago

You can always access the masks from attention (with the explain method) and try to cluster these with UMAP or TSNE but I'm not sure that's what you want ?

ensonario commented 2 years ago

Thanks @Optimox for your prompt response. Yes, I can access masks, but I'm not sure does it make sense to analyse them directly. Is there some kind internal embedding in Tabnet I can use for the vesualisation? In case of VAE for instance, we have bottleneck features, which can be trested as high-level presentation of raw data. And visualisation of these features helps to understand data. So I'm wondering can TabNet be used in the same way? I hope it makes sense :)

Optimox commented 2 years ago

you would need to access the results before final mapping : https://github.com/dreamquark-ai/tabnet/blob/4fa545da50796f0d16f49d0cb476d5a30c2a27c1/pytorch_tabnet/tab_network.py#L480

But to be honest I think it would be better to use VAE if what you want to do is visualization. Also I think it might still be interesting to visualize separation power of attention only (through masks), as it can be seen as a way of reasoning for the model, so it would create clusters that are handled similarly by the model to make a prediction.

Optimox commented 2 years ago

@ensonario do you have any plots to share ?

ensonario commented 2 years ago

Hi @Optimox, I was a bit distracted, bit returned to this task again. I haven't done the visualisation yet, but I'll share if anything interesting comes up from this.

ensonario commented 2 years ago

Hi @Optimox , looking at these unsupervised masks visualisation it feels the masks are not just representing most predictive parameters on each step, but parameters inside the mask seems connected. It's like the mask is representing a high-level feature, for instance (occupation, race and sex) parameters in Mask1 mean that the mask itself represents complex dependencies between these parameters, and can be called "occupaction / race dependency".

Or Mask2 (race, country_of_origin) seems quite representative as well.

Mask3 with education-sex dependency is quite interesting too.

image

Is this assumption correct or is it just a considence?

And another question, which key parameters should I care for unsupervised traing? cat_emb_dim, n_steps?

What is logic behind setting pretraining_ratio parameter? What is the motivation for 0.3, 0.5 or 0.7 (as in unsupervised example) values?

Optimox commented 2 years ago

Hello @ensonario,

Hope this helps!