Closed tommybotch closed 1 year ago
Hi @tlasmanbotch,
Thank you for your interest in our work.
Regarding your questions, let's outline the structure of a VQ-VAE. A VQ-VAE is formed by an Encoder that takes the input image [B,C,H,W,D] and projects it into the encoder space of size [B,c,h,w,d]. Then the quantizer takes the encoder space [B,c,h,w,d] and projects it first to an index quantized [B,1,h,w,d] and then back into quantized [B,c,h,w,d] by mapping each vector of the encoder space to the closest element in its codebook.
If you have any other questions please let me know. If you want we can set up a meeting and clarify everything. Otherwise if all is good please close the issue.
Cheers,
Dan
Hi Dan,
Thank you for your detailed reply - I greatly appreciate the help! This all makes sense and apologies for my misunderstanding. I am closing this issue.
Best, Tommy
Hello! Thank you for sharing your work. I ran the VQVAE model with the following parameters:
My input is shape
(1,1,96,128,96)
and the output of the encoder is shape(1, 256, 6, 8, 6)
. Given 256 channels, I would expect to receive 256 embedding indices from the quantizer (expected shape(256, 6, 8, 6)
). However, the output of the functionindex_quantize
yields embedding indices of shape(1, 6, 8, 6)
.