Closed bkj closed 7 years ago
in a one-hot encoding of a (truly) categorical sample, the maximum is indeed 1, and all the other values are 0. The Gumbel-Softmax distribution relaxes this: instead, the maximum is < 1, and all the other values are nonzero. Like you said, this vector is dense (and continuous, as opposed to discrete).
As the temperature decreases, y
becomes more and more sparse. You can see this by plotting the entropy of the Gumbel-Softmax samples as a function of temperature.
I was playing around with the notebook, trying to look at the intermediate representations of the training data. I was expecting that the output of the
y
layer would be (pretty) sparse and (nearly) binarized. But it seems like that's not the case:So it looks like the intermediate representations are still dense and not very binary. Any thoughts? (I'm new to Tensorflow/VAEs, so I may be making some silly coding/conceptual mistake...)
Edit: Maybe this is a matter of the
hard
parameter ingumbel_softmax
? I understand that forces the representation to be sparse/binary, but AFAIK it'd just be a sample from a categorical distribution that doesn't necessarily have most of it's mass on a single category.