Carmoondedraak / FACT-2021

MIT License
0 stars 1 forks source link

Progress Attention Disentanglement #6

Closed alfjesus3 closed 3 years ago

alfjesus3 commented 3 years ago

The following preliminary results show an faster decrease in the reconstruction loss when using the attention disentanglement loss.

alfjesus3 commented 3 years ago

Current preliminary results using the Disentanglement metric proposed by Kim et. al. 2018. The accuracy is around 0.61 with 66000 iterations.

=> loaded checkpoint 'checkpoints/tmp/33000 (iter 33000)'                                                                                                                                                    
=> loaded checkpoint 'checkpoints/tmp/33000 (iter 33000)'                                                                                                                                                    
66000it [00:20, 1647.08it/s]The factors are  <class 'torch.Tensor'> torch.Size([737280, 5]) with classes 5
66000it [00:40, 1647.08it/s]The empirical mean for kl dimensions-wise:
[[ 0.16251403]
 [ 0.04704665]
 [-0.13326351]
 [-0.49411842]
 [-0.49725255]
 [ 0.11181379]
 [ 0.09036198]
 [-0.4975004 ]
 [-0.49553087]
 [-0.4970772 ]]
Useful dimensions: [0 1 5 6]  - Total: 4
Empirical Scales: [[[1.1460223]]

 [[1.0301305]]

 [[1.1061238]]

 [[1.0858247]]]
Votes:
 [[ 20.  20.   0.   0. 160.]
 [  1.  40.  65.   0.   0.]
 [100.  40.  95.   0.   0.]
 [ 39.  60.   0. 160.   0.]]

The accuracy is 0.60625
alfjesus3 commented 3 years ago

[Update] The training metric of 'total correlation' and 'reconstruction loss' over the first 50000 iterations. It is better to average over 100 iterations chunks to get less spiky results (?) disent_res_abl

disent_train_metrics_abl

figure8

The current experimental setup is:

alfjesus3 commented 3 years ago

Update on the disentanglement metric spiking behaviour - The vanilla factor VAE is stabler when plotting the disentanglement metric hence the issue is likely to be the way the attention loss (L_AD) is computed . There was an error when computing the disentanglement metric which was fixed.

metric_diff

alfjesus3 commented 3 years ago

The current experimental setup is: