Hi, thanks for the great work. I tried the following code snippet with the internlm-xcomposer2-vl-7b model for QA task with two input images.
images = [osp.join( image_folder_dir, "COCO_val2014_000000143961.jpg"),
osp.join( image_folder_dir, "COCO_val2014_000000274538.jpg")]
image1 = model.encode_img(images[0])
image2 = model.encode_img(images[1])
image = torch.cat((image1, image2), dim=0)
query = """First picture:<ImageHere>, second picture:<ImageHere>. Describe the subject of these two pictures?"""
response, _ = model.interleav_wrap_chat(tokenizer, query, image, history=[], meta_instruction= True)
(here the meta_instruction is a required positional argument, not sure whether it should be set to True or False)
However, I realized that the returned response is actually {'inputs_embeds': wrap_embeds}.
How should I further proceed to get the decoded text output?
Thanks in advance!
Hi, thanks for the great work. I tried the following code snippet with the
internlm-xcomposer2-vl-7b
model for QA task with two input images.(here the
meta_instruction
is a required positional argument, not sure whether it should be set to True or False) However, I realized that the returnedresponse
is actually{'inputs_embeds': wrap_embeds}
. How should I further proceed to get the decoded text output? Thanks in advance!