-
Hi,
I recently read [this](https://ml.berkeley.edu/blog/posts/clip-art/) blog and was fascinated by the potential of these generative models. I am hoping to learn the fundamentals, reimplement models…
-
Hi, Thank you for great work.
I have question about the Eq.1 of supp.
$\mathcal L_\text{VQ-VAE}=-\log p(X|\mathbf Z) + \|\| \text{sg}[\hat{\mathbf Z}]-\mathbf Z\|\|^2_2+\|\| \hat{\mathbf Z} - \tex…
-
Hello, first of all thanks for this interesting implementation of VQ-VAE 2 paper.
I can train this network on a dataset of mine, however reconstructed images are a little bit blurry. Quality is goo…
-
hi, this project use VQVAE to compress video into small latent space, and latent embedding dim is `512` or `256`. But in LDM, they usually use very small embedding dim `4` or `3`, SD use `4`. Will th…
-
As far as I understood, the perplexity used in this repo's VQ-VAE is kind of "meaningfully used codebook token numbers".
When only one codebook token is used, perplexity is 1.
When all codebook to…
-
The paper mentions a codebook size of 4096 for all models with 128/64/32 tokens for 256x256 and 128/64 tokens for 512x512.
I was wondering why the example configuration in `README.md` and `titok.py` …
-
Interesting work! However, in the DST module, the encoded feature maps with the shape of [T, C, \hat{H}, \hat{W}] is quantified into feature map with the shape of [T, D, \hat{H}, \hat{w}]. It is reall…
-
For purposes of utterance encoding, gesture encoding and facial expression encoding, we shall first appeal to a nonlinear solver technique for extracting the relevant features due to what is expected …
-
-
Hi,
I was training on my own data set with taming's VQ-GAN.
There's some error with the dimension with the vae model
![image](https://user-images.githubusercontent.com/85055246/124013852-68a8e680-…