divamgupta / stable-diffusion-tensorflow

Stable Diffusion in TensorFlow / Keras
Other
1.57k stars 227 forks source link

Remove extra padding and use asymmetric padding for downsampling #42

Closed Lime-Cakes closed 1 year ago

Lime-Cakes commented 1 year ago

Stable diffusion's first stage encoder doesn't use padding when downsampling (outside of resnetblock). This updates the model to remove the extra padding. It allows encoder to output correct latent now.

divamgupta commented 1 year ago

Do you have before and after results?

Lime-Cakes commented 1 year ago

Wait, I realized I made an error

Lime-Cakes commented 1 year ago

After edit, the shift is gone. But latent size changed, so still wrong behaviour. Not sure what's wrong. I'll see if I can find why latent shape is wrong

Lime-Cakes commented 1 year ago

Okay, now it's right.

Before https://colab.research.google.com/drive/1lwCF5vdys5U4u9skE_DFrK8ecUP8Pmnm?usp=sharing After https://colab.research.google.com/drive/1wbXQ6ihLoFx2vm90Y5Ok1WQ0le0XEbK9?usp=sharing

Before the change: (img passed to encoder to get latent, then latent passed through decoder to get output image) before After change: after

divamgupta commented 1 year ago

Thanks a lot. Shouldn't we just pass (0,1) padding in PaddedConv2D?