Open SeongYeonPark opened 2 years ago
I have the same doubts, perhaps does anyone know what the meaning of "perplexity" is (it would be better to be more specific) but I have found a file that is available for exporting perplexity, and you can look at that file in the hope that it is helpful for you. https://github.com/zalandoresearch/pytorch-vq-vae/blob/master/vq-vae.ipynb
I looked into the file I sent you and found that encoding uses onehot and avg_probs averages the columns over the range [BHW,num], so avg_probs has a range of [0, 1] (it can't actually get close to one, I tested the maximum value of avg_probs is no more than 0.01) so the minimum value of perplexity is greater than e**0 == 1.
As far as I understood, the perplexity used in this repo's VQ-VAE is kind of "meaningfully used codebook token numbers".
When only one codebook token is used, perplexity is 1. When all codebook tokens appear uniformly, the perplexity equals the codebook nums.
So I was wondering, for good output quality, what is the minimum threshold of "perplexity divided by codebook nums"? (I guess this should be found experimentally. If you have any results related to this question, it would be great to know.)