RuntimeError: The size of tensor a (705) must match the size of tensor b (673) at non-singleton dimension 1

whydna commented 1 year ago

Getting the following error:

Running predict()...
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/cog/server/worker.py", line 222, in _predict
for r in result:
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "predict.py", line 338, in predict
pipe, kwargs = self.build_pipe(
File "predict.py", line 188, in build_pipe
img = getattr(self, "{}_preprocess".format(name))(img)
File "predict.py", line 133, in depth_preprocess
return self.midas(img)
File "/src/midas_hack.py", line 52, in __call__
depth = self.model(image_depth)[0]
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/controlnet_aux/midas/api.py", line 167, in forward
prediction = self.model(x)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/controlnet_aux/midas/midas/dpt_depth.py", line 108, in forward
return super().forward(x).squeeze(dim=1)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/controlnet_aux/midas/midas/dpt_depth.py", line 71, in forward
layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/controlnet_aux/midas/midas/vit.py", line 59, in forward_vit
glob = pretrained.model.forward_flex(x)
File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/controlnet_aux/midas/midas/vit.py", line 145, in forward_flex
x = x + pos_embed
RuntimeError: The size of tensor a (705) must match the size of tensor b (673) at non-singleton dimension 1

when running w/ params:

prompt
depth_image url_to_image
hough_image url_to_image
num_outputs 4
guidance_scale 9
negative_prompt 
image_resolution 512
num_inference_steps 20

whydna commented 1 year ago

I was able to narrow it down to be related to the depth_image property.

This is the image I'm using - it fails in the demo as well:

notwork

whydna commented 1 year ago

@anotherjesse

Here is a fix (tested on my own fork) - the depth preprocess requires images to have dimensions divisible by 64px.

See: https://github.com/patrickvonplaten/controlnet_aux/issues/2

Any chance we can get this deployed to replicate?

    def depth_preprocess(self, img):
        W, H = img.size
        W_new = int(np.round(W/64) * 64)
        H_new = int(np.round(H/64) * 64)
        img = img.resize((W_new, H_new))
        return self.midas(img)

anotherjesse / multi-control

RuntimeError: The size of tensor a (705) must match the size of tensor b (673) at non-singleton dimension 1 #1