I have a question about the perplexity term (in the VQ-VAE).

google-deepmind / sonnet

TensorFlow-based neural network library

https://sonnet.dev/

Apache License 2.0

9.79k stars 1.3k forks source link

I have a question about the perplexity term (in the VQ-VAE). #257

Open SeongYeonPark opened 2 years ago

SeongYeonPark commented 2 years ago

As far as I understood, the perplexity used in this repo's VQ-VAE is kind of "meaningfully used codebook token numbers".

When only one codebook token is used, perplexity is 1. When all codebook tokens appear uniformly, the perplexity equals the codebook nums.

So I was wondering, for good output quality, what is the minimum threshold of "perplexity divided by codebook nums"? (I guess this should be found experimentally. If you have any results related to this question, it would be great to know.)

PipeDream941 commented 2 years ago

I have the same doubts, perhaps does anyone know what the meaning of "perplexity" is (it would be better to be more specific) but I have found a file that is available for exporting perplexity, and you can look at that file in the hope that it is helpful for you. https://github.com/zalandoresearch/pytorch-vq-vae/blob/master/vq-vae.ipynb

PipeDream941 commented 2 years ago

I looked into the file I sent you and found that encoding uses onehot and avg_probs averages the columns over the range [BHW,num], so avg_probs has a range of [0, 1] (it can't actually get close to one, I tested the maximum value of avg_probs is no more than 0.01) so the minimum value of perplexity is greater than e**0 == 1.