YongfeiYan / Gumbel_Softmax_VAE

PyTorch implementation of a Variational Autoencoder with Gumbel-Softmax Distribution
198 stars 37 forks source link

latent dim #4

Open yzhou359 opened 5 years ago

yzhou359 commented 5 years ago

Hi, what does the latent_dim mean in your code? Could it be changed to other numbers? I can understand that categorical_dim means 10 categories for 10 digits, but I'm confused about the latent_dim. Thanks!

yjlolo commented 4 years ago

Same question here; it would be great if someone can shed lights on latent_dim, which is N in the author's notebook https://github.com/ericjang/gumbel-softmax/blob/master/Categorical%20VAE.ipynb.

Why do we need latent_dim (or number of categorical distributions as in author's notebook), making the fully-connected layer output categorical_dim * latent_dim instead of just categorical_dim?

gokceneraslan commented 3 years ago

I think latent_dim represents how many categorical variables there is in the model, while categorical_dim denotes the number of categories in each latent categorical variable. This is why the "true" dimensionality of the encoder output and the decoder input is 300 (30 variables x 10 categories for each var) in this model.

The misinterpretation stems from the assumption that 10 categories of the categorical latent space represents 10 digits, but this is not necessarily the case, because there are is a lot of variation in the data in addition to the digit type (azimuth, width, thickness) which is why the model needs more than just 10 categories in the latent space.