Open pokameng opened 1 month ago
@LTH14 How can I download the kl- 8?
You can directly use this one for the kl-8 https://huggingface.co/stabilityai/sd-vae-ft-ema
@LTH14 Hi,bro! I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].
Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer.
Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer. well, the output of mar is [B L D],so i need to convert it to [B 4 32 32] if i want to use a pre-trained SD model(e.g. controlnet),right? and the vae i need to use kl-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]?
If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.
The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.
If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.
The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.
Yes, I know it. I want to use sd-1.5, so which tokenier I should use? I am using KL-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]
If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.
The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.
In other words, if I want to use sd-1.5, which tokenier should I choose? I want to use a non-quantified tokenizer so that it is consistent with yours
For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.
For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.
Yes! I am using the tokenier from https://huggingface.co/stabilityai/sd-vae-ft-ema
@LTH14 Hello Bro I found the VAE in mar is KL-16, the latent dimension is [B 16 16 16], and when use KL-8, the latent dimension is [B 4 32 32]. I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].