Closed zhourax closed 2 months ago
You may try (1) image = image.to('cuda')
and (2) prompt = 'First picture: <ImageHere>, second picture: <ImageHere>. Describe the subject of these two pictures?'
It seems that the issue lies in the line: image1 = model.encode_img(images[0]) My images[0] is a path to a regular PNG image, which is a string type. The only change I made was to modify the build_vision_tower() in build_mlp.py def build_vision_tower():
vision_tower = '/home/xxx/model/clip-vit-large-patch14-336'
return CLIPVisionTower(vision_tower)
Could this be the reason for the issue?
Traceback (most recent call last):
File "/home/xxx/test.py", line 79, in
You can check if the image is of 'float32' and the model is of 'float16' dtype.
Kindly re-open if you still have any questions.
Hi, thanks for the great work. I tried the following code snippet with the internlm-xcomposer2-vl-7b
model.
images = [osp.join( image_folder_dir, "COCO_val2014_000000143961.jpg"),
osp.join( image_folder_dir, "COCO_val2014_000000274538.jpg")]
image1 = model.encode_img(images[0])
image2 = model.encode_img(images[1])
image = torch.cat((image1, image2), dim=0)
query = """First picture:<ImageHere>, second picture:<ImageHere>. Describe the subject of these two pictures?"""
response, _ = model.interleav_wrap_chat(tokenizer, query, image, history=[], meta_instruction= True)
(here the meta_instruction
is a required positional argument, not sure whether it should be set to True or False)
However, I realized that the returned response
is actually {'inputs_embeds': wrap_embeds}
.
How should I further proceed to get the decoded text output? Thanks in advance!
model = AutoModelForCausalLM.from_pretrained('your model path').cuda().eval() tokenizer = AutoTokenizer.from_pretrained('your model path')
images = ["./a.png", "./b.png"] image1 = model.encode_img(images[0]) image2 = model.encode_img(images[1]) image = torch.cat((image1, image2), dim=0)
query = ""First picture:, second picture:. Describe the subject of these two pictures?"""
response, _ = model.interleav_wrap_chat(tokenizer, query, image, history=[]) print(response) 模型为InternLM-XComposer2-VL-7B,使用以上代码会出现如下报错
File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 118, in encode_img img_embeds, atts_img, img_target = self.img2emb(image) File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 122, in img2emb img_embeds = self.vision_proj(self.vit(image.to(self.device))) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 must have the same dtype