cobanov / image-captioning

Image captioning using python and BLIP
41 stars 10 forks source link

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0 #5

Open loboere opened 1 year ago

loboere commented 1 year ago

my images are 256x256 pixels

/content/image-captioning 2023-09-02 18:30:18.889829: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Device: cuda:0 Images found: 263 Split size: 263 Checkpoint loading... load checkpoint from ./checkpoints/model_large_caption.pth

Model to cuda:0 Inference started 0batch [00:01, ?batch/s] Traceback (most recent call last): File "/content/image-captioning/inference.py", line 88, in caption = model.generate( File "/content/image-captioning/models/blip.py", line 201, in generate outputs = self.text_decoder.generate( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1675, in generate return self.beam_search( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3014, in beam_search outputs = self( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/content/image-captioning/models/med.py", line 886, in forward outputs = self.bert( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/content/image-captioning/models/med.py", line 781, in forward encoder_outputs = self.encoder( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/content/image-captioning/models/med.py", line 445, in forward layer_outputs = layer_module( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/content/image-captioning/models/med.py", line 361, in forward cross_attention_outputs = self.crossattention( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/content/image-captioning/models/med.py", line 277, in forward self_outputs = self.self( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/image-captioning/models/med.py", line 178, in forward attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

KingOfRed commented 11 months ago

I have found the solution, at least for myself personally: it might be a version mismatch with some of the modules, check the requirements text file, if any of those modules are newer on your system than the requirements in this file, there might be feature deprecation preventing this from running

I had this same exact error and fixed it with this command: pip install timm==0.4.12 transformers==4.17.0 fairscale==0.4.4 pycocoevalcap pillow It found that my timm, transformers and fairscale were on newer versions, pulled the downgrade, and got this working first try.

If you use these for anything else already and it might break functionality, it may not be worth it, unless you really need the functionality of this system.

EDIT: This error also crops up if you try to create a batch size larger than the number of image files being processed

cobanov commented 11 months ago

Yeah please do pip install on an empty virtual env