Open parasmech opened 1 year ago
It seems that your CLIP encoder is not correct. Have you tried our demo code at here?
import cv2
import llama
import torch
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
llama_dir = "/path/to/LLaMA/"
# choose from BIAS-7B, LORA-BIAS-7B
model, preprocess = llama.load("BIAS-7B", llama_dir, device)
model.eval()
prompt = llama.format_prompt("Please introduce this painting.")
img = Image.fromarray(cv2.imread("../docs/logo_v1.png"))
img = preprocess(img).unsqueeze(0).to(device)
result = model.generate(img, [prompt])[0]
print(result)
Hello
I ran the demo.py for an image and it works. Now trying to do captioning on a list of images code snippet: captions_LLAMA = [] for image in igs_trnsfmd: caption = model.generate(image, [prompt])[0] captions_LLAMA.append(caption)
But getting this error::
RuntimeError Traceback (most recent call last) in <cell line: 3>()
2 captions_LLAMA = []
3 with torch.no_grad():
----> 4 caption = model.generate(image, [prompt])[0]
5 captions_LLAMA.append(caption)
3 frames /content/llama/llama_adapter.py in clip_encode_image(self, x) 118 x = torch.cat([self.clip.visual.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, 119 x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1) # shape = [*, grid ** 2 + 1, width] --> 120 x = x + self.clip.visual.positional_embedding.to(x.dtype) 121 x = self.clip.visual.ln_pre(x) 122 RuntimeError: The size of tensor a (730) must match the size of tensor b (257) at non-singleton dimension 1
Please help