Open CDitzel opened 3 years ago
Hi @CDitzel, sorry about the delay. The 3-dim codebook comes from the VQ-VAE paper where they write:
In our experiments we define N discrete latents (e.g., we use a field of 32 x 32 latents for ImageNet, or 8 x 8 x 10 for CIFAR10).
On CIFAR10 it definitely improve performance, but judging from paper it's not necessary for ImageNet (so long as you have a finer spatial resolution).
Is the idea to achieve some from of multi head gumbel sampling?
Yeah, each component of the 3rd dimension can be thought of as a separate codebook with the codes concatenated at the end.
thanks for getting back to me on this one. Mh interesting. I am familiar with the paper but did not interpret the quoted lines they way you did. Very interesing =)
As for the gumbel softmax part, I am not entirely sure that it is "correct" to use the distance metrics to the codebook entries to parametrize the posterior to the uniform prior. I believe it would be more appropriate to use the latents, i.e. output of the encoder in the KL but personally I did not manage to get it working then
s not necessary for ImageNet (so long as you have a finer spatial resolution).
Can you please elaborate what you mean by spatial resolution?
I am having trouble obtaining convergence on a custom data set with 256x256 images which are downsampled 4 times to 16x16 spatial resolution latent vector by the encoder. Training error decreases as it should but test error plateaus pretty soon and does not improve further. Only used the Gumbel version for this.
Thank you very much in advance
Hello and thank you for this repo.
I was wondering, if there is a reason to use a 3-dim embedding instead of a 2-dim codebook.
Is the idea to achieve some from of multi head gumbel sampling? The dimension N is taken through all calculations only to be combined with the feature dimension again at the end.
The babmm operation is very expensive but preliminary runs show, at least for me, a superior reconstruction quality as compared to keeping it 2-dimensional.
Thank you in advance