RuntimeError: Given groups=1, weight of size [192, 3, 3, 3], expected input[6, 4, 32, 32] to have 3 channels, but got 4 channels instead？ error

CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

MIT License

11.61k stars 1.51k forks source link

RuntimeError: Given groups=1, weight of size [192, 3, 3, 3], expected input[6, 4, 32, 32] to have 3 channels, but got 4 channels instead？ error #43

Open EricHuiK opened 2 years ago

NicholasDascanio commented 2 years ago

Hiya! Have you been lucky in solving this issue? :)

samedii commented 2 years ago

You are probably inputting an image with an alpha channel like PNG. Try only inputting RGB

NicholasDascanio commented 2 years ago

Hi @samedii ! I'm trying to write an inference scripts for running this (https://ommer-lab.com/files/latent-diffusion/text2img.zip) txt2image model. I replaced the relative config yml file and model checkpoint. By inputting a text prompt the system gives me the error that @EricHuiK mentioned. In particular, the error arises in the functions called function at line 137 of the inference script samplesddim, = sampler.sample(S=opt.ddim_steps,...).

Do you happen to know where to update the code? :)

Animadversio commented 2 years ago

I encountered the same error when trying to use this config (models/ldm/text2img256/config.yaml) I tried to fix it by changing the L136 shape = [4, opt.H//8, opt.W//8] to shape = [3, opt.H//8, opt.W//8] . After this change, the code could run smoothly, no more shape incompatibility.

(however, the result image doesn't look good from this setting . Not sure if there are more things to change.

qinghew commented 2 years ago

me too

karbalan commented 2 years ago

Has anyone managed to find a solution to this? I am facing the same issue

SantiUsma commented 1 year ago

Hello, indeed the problem is in the shape. If they put in the variable shape=[3,64,64] the model runs normally, this variable indicates what should be the size of the latent space that is generated with the diffusion model. Although, as said before, the model does not give good results if you are using the weights of text2image but the problem is not the shape, but the weights.