Inputs and Outputs of Different Sizes

Hello thank you for the code and especially for the distributed support it contains. I try to enlight dark images. In my problem settings the y_cond is of shape B, 32, H, W and the y_0 (ground truth) is of shape B, 3, H, W. Therefore, I set the in_channel value of the Unet to 35. The training went smoothly. However, during testing inside restoration method the code is as follows: y_t = default(y_t, lambda: torch.randn_like(y_cond)) Since y_t is None at the beginning, the code above creates y_t as a random noise of shape B, 32, H, W and inside p_sample y_t is concatenated with y_cond which results in a shape of B, 64, H, W. That sahpe is not compatible with the expected shape of the input to the Unet.

What am I missing? Can someone please help ?

Janspiry / Palette-Image-to-Image-Diffusion-Models

Inputs and Outputs of Different Sizes #101