OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
GNU General Public License v3.0
5.77k stars 375 forks source link

RuntimeError: The size of tensor a (730) must match the size of tensor b (257) at non-singleton dimension 1 #102

Open parasmech opened 1 year ago

parasmech commented 1 year ago

Hello

I ran the demo.py for an image and it works. Now trying to do captioning on a list of images code snippet: captions_LLAMA = [] for image in igs_trnsfmd: caption = model.generate(image, [prompt])[0] captions_LLAMA.append(caption)

But getting this error::


RuntimeError Traceback (most recent call last) in <cell line: 3>() 2 captions_LLAMA = [] 3 with torch.no_grad(): ----> 4 caption = model.generate(image, [prompt])[0] 5 captions_LLAMA.append(caption)

3 frames /content/llama/llama_adapter.py in clip_encode_image(self, x) 118 x = torch.cat([self.clip.visual.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, 119 x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1) # shape = [*, grid ** 2 + 1, width] --> 120 x = x + self.clip.visual.positional_embedding.to(x.dtype) 121 x = self.clip.visual.ln_pre(x) 122 RuntimeError: The size of tensor a (730) must match the size of tensor b (257) at non-singleton dimension 1

Please help

csuhan commented 1 year ago

It seems that your CLIP encoder is not correct. Have you tried our demo code at here?

import cv2
import llama
import torch
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"

llama_dir = "/path/to/LLaMA/"

# choose from BIAS-7B, LORA-BIAS-7B
model, preprocess = llama.load("BIAS-7B", llama_dir, device)
model.eval()

prompt = llama.format_prompt("Please introduce this painting.")
img = Image.fromarray(cv2.imread("../docs/logo_v1.png"))
img = preprocess(img).unsqueeze(0).to(device)

result = model.generate(img, [prompt])[0]

print(result)