Problems with "mask" and "inpaint".

renessal commented 1 year ago

I have problem with "Inpaint":

Error while Generating. The size of tensor a (53) must match the size of tensor b (54) at non-singleton dimension 3 (controlnet.py:769)

And also when using "Mask":

Error while Decoding. images do not match (Image.py:1889)

How I can fix it?

renessal commented 1 year ago

Trace for "Mask":

Traceback (most recent call last):
  File "/content/sd-inference-server/server.py", line 224, in run
    self.wrapper.img2img()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/content/sd-inference-server/wrapper.py", line 976, in img2img
    outputs, masked = utils.apply_inpainting(images, original_images, masks, extents)
  File "/content/sd-inference-server/utils.py", line 165, in apply_inpainting
    masked[i].putalpha(mask)
  File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 1889, in putalpha
    self.im.putband(alpha.im, band)
ValueError: images do not match

Trace for "Inpaint":

Traceback (most recent call last):
  File "/content/sd-inference-server/server.py", line 224, in run
    self.wrapper.img2img()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/content/sd-inference-server/wrapper.py", line 965, in img2img
    latents = inference.img2img(latents, denoiser, sampler, noise, self.steps, False, self.strength, self.on_step)
  File "/content/sd-inference-server/inference.py", line 34, in img2img
    latents = sampler.step(latents, schedule, i, noise)
  File "/content/sd-inference-server/samplers_k.py", line 298, in step
    denoised = self.predict(x, sigmas[i])
  File "/content/sd-inference-server/samplers_k.py", line 57, in predict
    original = self.model.predict_original(latents, timestep, sigma)
  File "/content/sd-inference-server/guidance.py", line 151, in predict_original
    original_pred = self.predict_original_epsilon(model_input, timestep, sigma, conditioning)
  File "/content/sd-inference-server/guidance.py", line 126, in predict_original_epsilon
    noise_pred = self.predict(latents * c_in, timestep, conditioning)
  File "/content/sd-inference-server/guidance.py", line 54, in predict
    return self.unet(inputs, timestep, encoder_hidden_states=conditioning, added_cond_kwargs=self.additional_conditioning).sample
  File "/content/sd-inference-server/controlnet.py", line 106, in __call__
    down, mid = self.controlnets[i](
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/controlnet.py", line 769, in forward
    sample = sample + controlnet_cond
RuntimeError: The size of tensor a (53) must match the size of tensor b (54) at non-singleton dimension 3

arenasys commented 1 year ago

I believe this happens when using resolutions not divisible by 8.

arenasys commented 1 year ago

should be fixed by https://github.com/arenasys/sd-inference-server/commit/00866171ad40a54640b3709d65ba2ee789bd7e62

arenasys / qDiffusion

Problems with "mask" and "inpaint". #43