About VAE channels - Githubissues

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

MIT License

1.03k stars 56 forks source link

About VAE channels #56

Open pokameng opened 1 month ago

pokameng commented 1 month ago

@LTH14 Hello Bro I found the VAE in mar is KL-16, the latent dimension is [B 16 16 16], and when use KL-8, the latent dimension is [B 4 32 32]. I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].

pokameng commented 1 month ago

@LTH14 How can I download the kl- 8?

LTH14 commented 1 month ago

You can directly use this one for the kl-8 https://huggingface.co/stabilityai/sd-vae-ft-ema

pokameng commented 1 month ago

@LTH14 Hi,bro! I have a question: if I use the SD model or other big diffusion model instead of MLP, I need to change the vae right? The SD take an input of [B 4 H W].

LTH14 commented 1 month ago

Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer.

pokameng commented 1 month ago

Yes -- if you want to use a pre-trained SD model, then you should use SD's KL-8 tokenizer. well, the output of mar is [B L D],so i need to convert it to [B 4 32 32] if i want to use a pre-trained SD model(e.g. controlnet),right? and the vae i need to use kl-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]?

LTH14 commented 1 month ago

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

pokameng commented 1 month ago

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

Yes, I know it. I want to use sd-1.5, so which tokenier I should use? I am using KL-8 tokenier from [https://huggingface.co/stabilityai/sd-vae-ft-ema]

pokameng commented 1 month ago

If you use KL-8, you should set --vae_embed_dim 4 --vae_stride 8 --patch_size 2. In this case, the output of mar is B, 256, 16. You should unpachify it (we have this function in MAR) to B, 4, 32, 32.

The exact tokenizer depends on which SD version you are using. Different SD version typically use different tokenizers.

In other words, if I want to use sd-1.5, which tokenier should I choose? I want to use a non-quantified tokenizer so that it is consistent with yours

LTH14 commented 1 month ago

For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.

pokameng commented 1 month ago

For sd-1.5 you can use https://huggingface.co/stabilityai/sd-vae-ft-ema or https://huggingface.co/stabilityai/sd-vae-ft-mse. They both have the same tokenizer encoder (so the latent is the same) but different tokenizer decoders.

Yes! I am using the tokenier from https://huggingface.co/stabilityai/sd-vae-ft-ema