Closed alfjesus3 closed 3 years ago
Current preliminary results using the Disentanglement metric proposed by Kim et. al. 2018. The accuracy is around 0.61 with 66000 iterations.
=> loaded checkpoint 'checkpoints/tmp/33000 (iter 33000)'
=> loaded checkpoint 'checkpoints/tmp/33000 (iter 33000)'
66000it [00:20, 1647.08it/s]The factors are <class 'torch.Tensor'> torch.Size([737280, 5]) with classes 5
66000it [00:40, 1647.08it/s]The empirical mean for kl dimensions-wise:
[[ 0.16251403]
[ 0.04704665]
[-0.13326351]
[-0.49411842]
[-0.49725255]
[ 0.11181379]
[ 0.09036198]
[-0.4975004 ]
[-0.49553087]
[-0.4970772 ]]
Useful dimensions: [0 1 5 6] - Total: 4
Empirical Scales: [[[1.1460223]]
[[1.0301305]]
[[1.1061238]]
[[1.0858247]]]
Votes:
[[ 20. 20. 0. 0. 160.]
[ 1. 40. 65. 0. 0.]
[100. 40. 95. 0. 0.]
[ 39. 60. 0. 160. 0.]]
The accuracy is 0.60625
[Update] The training metric of 'total correlation' and 'reconstruction loss' over the first 50000 iterations. It is better to average over 100 iterations chunks to get less spiky results (?)
The current experimental setup is:
Update on the disentanglement metric spiking behaviour - The vanilla factor VAE is stabler when plotting the disentanglement metric hence the issue is likely to be the way the attention loss (L_AD) is computed .
There was an error when computing the disentanglement metric which was fixed.
The current experimental setup is:
The following preliminary results show an faster decrease in the reconstruction loss when using the attention disentanglement loss.