Closed Lime-Cakes closed 1 year ago
Do you have before and after results?
Wait, I realized I made an error
After edit, the shift is gone. But latent size changed, so still wrong behaviour. Not sure what's wrong. I'll see if I can find why latent shape is wrong
Okay, now it's right.
Before https://colab.research.google.com/drive/1lwCF5vdys5U4u9skE_DFrK8ecUP8Pmnm?usp=sharing After https://colab.research.google.com/drive/1wbXQ6ihLoFx2vm90Y5Ok1WQ0le0XEbK9?usp=sharing
Before the change: (img passed to encoder to get latent, then latent passed through decoder to get output image) After change:
Thanks a lot. Shouldn't we just pass (0,1) padding in PaddedConv2D?
Stable diffusion's first stage encoder doesn't use padding when downsampling (outside of resnetblock). This updates the model to remove the extra padding. It allows encoder to output correct latent now.