About the shape of variable 'min_encoding_indices'

MishaLaskin / vqvae

A pytorch implementation of the vector quantized variational autoencoder (https://arxiv.org/abs/1711.00937)

656 stars 79 forks source link

I think you just need to reshape the min_encoding_indices to something the PixelCNN can work with. I use this code (bsz is the batch size, s is the spatial dim of z, and c is the channel dim of z):

min_encoding_indices = torch.argmin(d, dim=1).unsqueeze(1)
min_encoding_indices_batched = min_encoding_indices.view(
          bsz, s*c//self.e_dim, s*c//self.e_dim, 1)
min_encoding_indices_batched = min_encoding_indices_batched.permute(
          0, 3, 1, 2)`

You should be able to train a PixelCNN with the min_encoding_indices_batched as the input. To sample you shape it back to min_encoding_indices and proceed with the code in the repo

MishaLaskin / vqvae

About the shape of variable 'min_encoding_indices' #4