ericjang / gumbel-softmax

categorical variational autoencoder using the Gumbel-Softmax estimator
MIT License
425 stars 101 forks source link

Should `y` be sparse/binary? #1

Closed bkj closed 7 years ago

bkj commented 7 years ago

I was playing around with the notebook, trying to look at the intermediate representations of the training data. I was expecting that the output of the y layer would be (pretty) sparse and (nearly) binarized. But it seems like that's not the case:

...
Step 40001, ELBO: -101.598
Step 45001, ELBO: -99.799

>>> np_x, _ = data.next_batch(1)
>>> emb = sess.run(y, {x : np_x})
>>> emb.max(axis=-1) # Value of maximum of embedding -- would expect to be 1
array([ 0.13201179,  0.36978129,  0.41773844,  0.26891398,  0.24909849,
        0.21777716,  0.1552867 ,  0.47244716,  0.16195767,  0.39042374,
        0.17623694,  0.2765696 ,  0.19546057,  0.18048088,  0.12659149,
        0.64287513,  0.14742081,  0.2126791 ,  0.53717244,  0.23660626,
        0.14906606,  0.15466955,  0.1191797 ,  0.20597951,  0.25431085,
        0.1979771 ,  0.16981648,  0.2198326 ,  0.17538837,  0.27005175], dtype=float32)

>>> ((emb < 0.01) | (emb > 0.99)).mean()
0.12

So it looks like the intermediate representations are still dense and not very binary. Any thoughts? (I'm new to Tensorflow/VAEs, so I may be making some silly coding/conceptual mistake...)

Edit: Maybe this is a matter of the hard parameter in gumbel_softmax? I understand that forces the representation to be sparse/binary, but AFAIK it'd just be a sample from a categorical distribution that doesn't necessarily have most of it's mass on a single category.

ericjang commented 7 years ago

in a one-hot encoding of a (truly) categorical sample, the maximum is indeed 1, and all the other values are 0. The Gumbel-Softmax distribution relaxes this: instead, the maximum is < 1, and all the other values are nonzero. Like you said, this vector is dense (and continuous, as opposed to discrete).

As the temperature decreases, y becomes more and more sparse. You can see this by plotting the entropy of the Gumbel-Softmax samples as a function of temperature.