According to the chat example below, I try to apply a chat conversation starting with one image and then sending three different prompts after getting a response for each prompt from the model. Please help me to implement how I can send multiple prompts in a sequence after getting the responses. I believe I need to use history but couldn't apply it.
The first query has an image and text prompt
The second query has only a text prompt
The third query has only a text prompt
System Info / 系統信息
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5') model = AutoModelForCausalLM.from_pretrained( 'THUDM/cogvlm-chat-hf', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ).to('cuda').eval()
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
According to the chat example below, I try to apply a chat conversation starting with one image and then sending three different prompts after getting a response for each prompt from the model. Please help me to implement how I can send multiple prompts in a sequence after getting the responses. I believe I need to use history but couldn't apply it. The first query has an image and text prompt The second query has only a text prompt The third query has only a text prompt
chat example
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5') model = AutoModelForCausalLM.from_pretrained( 'THUDM/cogvlm-chat-hf', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ).to('cuda').eval()
query = 'Describe this image' image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB') inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode inputs = { 'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'), 'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'), 'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'), 'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]], } gen_kwargs = {"max_length": 2048, "do_sample": False}
with torch.no_grad(): outputs = model.generate(inputs, gen_kwargs) outputs = outputs[:, inputs['input_ids'].shape[1]:] print(tokenizer.decode(outputs[0]))
Expected behavior / 期待表现
Three answers for each text prompt.