[Bug]: Interrogate CLIP returning <Error>

MNeMoNiCuZ commented 12 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

When trying to run Interrogate CLIP since the 1.6 update, I keep getting as the returned value. When doing it with BATCH, I get nothing.

I get a bunch of this spammed in the log though:

*** Error interrogating
    Traceback (most recent call last):
      File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 194, in interrogate
        caption = self.generate_caption(pil_image)
      File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 181, in generate_caption
        caption = self.blip_model.generate(gpu_image, sample=False, num_beams=shared.opts.interrogate_clip_num_beams, min_length=shared.opts.interrogate_clip_min_length, max_length=shared.opts.interrogate_clip_max_length)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\blip.py", line 156, in generate
        outputs = self.text_decoder.generate(input_ids=input_ids,
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 1611, in generate
        return self.beam_search(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 2909, in beam_search
        outputs = self(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 886, in forward
        outputs = self.bert(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 781, in forward
        encoder_outputs = self.encoder(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 445, in forward
        layer_outputs = layer_module(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 361, in forward
        cross_attention_outputs = self.crossattention(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 277, in forward
        self_outputs = self.self(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 178, in forward
        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
    RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

---

Steps to reproduce the problem

Go to img2img
Drop an image in the image field
Press Interrogate BLIP

I assume this is a local issue. Interrogate DeepBooru works.

What should have happened?

It should've worked :) Interrogate DeepBooru works.

Sysinfo

sysinfo-2023-09-30-22-23.txt

What browsers do you use to access the UI ?

Mozilla Firefox

Console logs

https://pastebin.com/5SPMwBHL

It's mostly a bunch of this:

I get a bunch of this spammed in the log though:

*** Error interrogating
    Traceback (most recent call last):
      File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 194, in interrogate
        caption = self.generate_caption(pil_image)
      File "C:\AI\stable-diffusion-webui\modules\interrogate.py", line 181, in generate_caption
        caption = self.blip_model.generate(gpu_image, sample=False, num_beams=shared.opts.interrogate_clip_num_beams, min_length=shared.opts.interrogate_clip_min_length, max_length=shared.opts.interrogate_clip_max_length)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\blip.py", line 156, in generate
        outputs = self.text_decoder.generate(input_ids=input_ids,
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 1611, in generate
        return self.beam_search(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\transformers\generation\utils.py", line 2909, in beam_search
        outputs = self(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 886, in forward
        outputs = self.bert(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 781, in forward
        encoder_outputs = self.encoder(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 445, in forward
        layer_outputs = layer_module(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 361, in forward
        cross_attention_outputs = self.crossattention(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 277, in forward
        self_outputs = self.self(
      File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\AI\stable-diffusion-webui\repositories\BLIP\models\med.py", line 178, in forward
        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
    RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

---



### Additional information

_No response_

tuwonga commented 11 months ago

same issue

tuwonga commented 11 months ago

have you fix it?

joorgejose commented 8 months ago

Try lowering the values in the Interrogate settings, had the same problem and changing those values worked for me:

chrome_njnynBcsSB

martin-rizzo commented 8 months ago

Adding some information:

When using BLIP with num_beams=2, the error shown in the console is: RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 0
Similarly, with BLIP num_beams=3, the error is: RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0
And so on...

The underlying issue appears to be that for num_beams=N:

The size of tensor 'a' is N, while the size of tensor 'b' is N^2.
The error indicates that both tensors should have the same size.
As a result, it only works correctly when num_beams=1.

The error originates from the following line:

File "/home/aiman/AIMan/Repos/stable-diffusion-webui/repositories/BLIP/models/med.py", line 178, in forward
   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))

I assume that the tensors 'a' and 'b' in the error are 'query_layer' and 'key_layer.transpose(-1,-2)' respectively.

AUTOMATIC1111 / stable-diffusion-webui