Open yzhou359 opened 5 years ago
Same question here; it would be great if someone can shed lights on latent_dim
, which is N
in the author's notebook https://github.com/ericjang/gumbel-softmax/blob/master/Categorical%20VAE.ipynb.
Why do we need latent_dim
(or number of categorical distributions as in author's notebook), making the fully-connected layer output categorical_dim * latent_dim
instead of just categorical_dim
?
I think latent_dim
represents how many categorical variables there is in the model, while categorical_dim
denotes the number of categories in each latent categorical variable. This is why the "true" dimensionality of the encoder output and the decoder input is 300 (30 variables x 10 categories for each var) in this model.
The misinterpretation stems from the assumption that 10 categories of the categorical latent space represents 10 digits, but this is not necessarily the case, because there are is a lot of variation in the data in addition to the digit type (azimuth, width, thickness) which is why the model needs more than just 10 categories in the latent space.
Hi, what does the latent_dim mean in your code? Could it be changed to other numbers? I can understand that categorical_dim means 10 categories for 10 digits, but I'm confused about the latent_dim. Thanks!