crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.21k stars 371 forks source link

Patch size 2 for HDiT #107

Open yuanzhi-zhu opened 2 months ago

yuanzhi-zhu commented 2 months ago

Is there expr of HDiT with patch size of 2? According to DiT, a big improvement is expected from patch size 4 to 2.

stefan-baumann commented 2 months ago

A patch size of 2 is overkill for pixel space diffusion models. Due to the VAE, a patch size of 2 in DiT corresponds to a patch size of 16 in HDiT. Reducing it to 4 gives a substantial improvement vs. 16, but 2 is very much in the range where it'll typically not be worthwhile to go there, especially considering the substantially increased processing cost as you're effectively doubling the resolution.