Open yuanzhi-zhu opened 2 months ago
A patch size of 2 is overkill for pixel space diffusion models. Due to the VAE, a patch size of 2 in DiT corresponds to a patch size of 16 in HDiT. Reducing it to 4 gives a substantial improvement vs. 16, but 2 is very much in the range where it'll typically not be worthwhile to go there, especially considering the substantially increased processing cost as you're effectively doubling the resolution.
Is there expr of HDiT with patch size of 2? According to DiT, a big improvement is expected from patch size 4 to 2.