RuntimeError: The size of tensor a (730) must match the size of tensor b (257) at non-singleton dimension 1

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

GNU General Public License v3.0

5.77k stars 375 forks source link

Hello

I ran the demo.py for an image and it works. Now trying to do captioning on a list of images code snippet: captions_LLAMA = [] for image in igs_trnsfmd: caption = model.generate(image, [prompt])[0] captions_LLAMA.append(caption)

But getting this error::

RuntimeError Traceback (most recent call last) in <cell line: 3>() 2 captions_LLAMA = [] 3 with torch.no_grad(): ----> 4 caption = model.generate(image, [prompt])[0] 5 captions_LLAMA.append(caption)

3 frames /content/llama/llama_adapter.py in clip_encode_image(self, x) 118 x = torch.cat([self.clip.visual.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, 119 x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1) # shape = [*, grid ** 2 + 1, width] --> 120 x = x + self.clip.visual.positional_embedding.to(x.dtype) 121 x = self.clip.visual.ln_pre(x) 122 RuntimeError: The size of tensor a (730) must match the size of tensor b (257) at non-singleton dimension 1

Please help

import cv2 import llama import torch from PIL import Image device = "cuda" if torch.cuda.is_available() else "cpu" llama_dir = "/path/to/LLaMA/" # choose from BIAS-7B, LORA-BIAS-7B model, preprocess = llama.load("BIAS-7B", llama_dir, device) model.eval() prompt = llama.format_prompt("Please introduce this painting.") img = Image.fromarray(cv2.imread("../docs/logo_v1.png")) img = preprocess(img).unsqueeze(0).to(device) result = model.generate(img, [prompt])[0] print(result)

OpenGVLab / LLaMA-Adapter

RuntimeError: The size of tensor a (730) must match the size of tensor b (257) at non-singleton dimension 1 #102