Open awei-6 opened 1 month ago
Usually we design causal models because we want to use autoregressive generation afterward, but as diffusion is generating in parallel, why is VAE designed to be causal? What's the intuition behind this design?
This can enable VAE to support both image and video encoding and decoding.
Usually we design causal models because we want to use autoregressive generation afterward, but as diffusion is generating in parallel, why is VAE designed to be causal? What's the intuition behind this design?