clementchadebec / benchmark_VAE

Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Apache License 2.0
1.77k stars 161 forks source link

Issue with Changing `embedding_dim` in VQ-VAE Model #138

Open arsh-rl opened 5 months ago

arsh-rl commented 5 months ago

Hi @clementchadebec. Thank you for creating this repository.

I am attempting to train a VQ-VAE model, but I couldn't find an embedding_dim argument in either the VQVAEconfig or VQVAE classes to assign a value to it. From what I have found, the only place where the embedding_dim is assigned is inside the _set_quantizer function of the VQVAE class, which is hard-coded to one.

    def _set_quantizer(self, model_config):
        if model_config.input_dim is None:
            raise AttributeError(
                "No input dimension provided !"
                "'input_dim' parameter of VQVAEConfig instance must be set to 'data_shape' where "
                "the shape of the data is (C, H, W ..). Unable to set quantizer."
            )

        x = torch.randn((2,) + self.model_config.input_dim)
        z = self.encoder(x).embedding
        if len(z.shape) == 2:
            z = z.reshape(z.shape[0], 1, 1, -1)

        z = z.permute(0, 2, 3, 1)

        self.model_config.embedding_dim = z.shape[-1]

z.shape[-1] always holds the value 1.

clementchadebec commented 5 months ago

Hi @arsh-rl,

Actually, this value is set automatically to either the number of channels of your encoded sample or the size of your latent space in case of flattened encoded input. This is needed to be able to quantized the encoded sample within the codebook. To change this value you should either adapt you encoder architecture to output a sample with the required embedded dimension if you created you own or change the latent_dim in your config if you use default nets. In the latter case, do not also forget to provide the input dimension of your data.

I hope this helps.

Best,

Clément