Qwen2-VL Batch Bug - Githubissues

LugerW-A commented 4 days ago

System Info

x86 Tensorrt_LLM 0.16.0

Who can help?

No response

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Qwen2-VL examples

Expected behavior

Dose Qwen2-VL support batch prompt? When the input is a batch, only the first result returns correctly, while the rest are all empty. print(input_ids.shape) print(prompt_table.shape) print(prompt_tasks) outputs = self.model.generate( input_ids, input_position_ids=None, mrope_params=mrope_params, sampling_config=None, prompt_table=prompt_table, prompt_tasks=prompt_tasks, max_new_tokens=max_new_tokens, end_id=end_id, pad_id=self.model.tokenizer.pad_token_id if self.model.tokenizer.pad_token_id is not None else self.model.tokenizer.all_special_ids[0], top_k=self.args.top_k, top_p=self.args.top_p, temperature=self.args.temperature, repetition_penalty=self.args.repetition_penalty, num_beams=self.args.num_beams, output_sequence_lengths=True, return_dict=True)

actual behavior

input_ids only differ in the first dimension, but the results are incorrect(empty).

additional notes

none

sunnyqgg commented 3 days ago

Hi @LugerW-A , it supports batch inference, and you need to follow the batch process provided by official QWen2-VL, please see more info at: https://github.com/QwenLM/Qwen2-VL?tab=readme-ov-file , like: messages1 = [ { "role": "user", "content": [ {"type": "image", "image": "xxx/image1.jpg"}, {"type": "text", "text": "Describe this picture?"}, ], } ] messages2 = [ { "role": "user", "content": [ {"type": "image", "image": "xxxx/image2.jpg"}, {"type": "text", "text": "Describe this picture? and what kind of coulor doese it containe?"}, ], } ] messages = [messages1, messages2] texts = [ processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in messages ] image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=texts, images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda")

sun2011yao commented 19 hours ago

@sunnyqgg HI, referring to the above writing method, the second output is empty. Have you printed the second output result? When I run it here, it shows that the second output is all eos_token_id

sunnyqgg commented 17 hours ago

HI @sun2011yao do you specify the --batch_size when running with multi batch?

sun2011yao commented 17 hours ago

HI @sun2011yao do you specify the --batch_size when running with multi batch?

yes, run the command as follows: python3 run.py \ --hf_model_dir ./${MODEL_NAME} \ --batch_size 2 \ --image_path ./pics/demo.jpeg \ --run_profiling \ --max_new_tokens 50 \ --visual_engine_dir tmp/trt_engines/${MODEL_NAME}/vision_encoder \ --llm_engine_dir tmp/trt_engines/${MODEL_NAME}/fp16/1-gpu/

sunnyqgg commented 16 hours ago

Hi, If you add messages = [messages1, messages2] like above for default , so please don't add --image_path ./pics/demo.jpeg, otherwise it will don't work, I'll add multi batch by specifying multi values for --image_path later.

sun2011yao commented 15 hours ago

Hi, If you add messages = [messages1, messages2] like above for default , so please don't add --image_path ./pics/demo.jpeg, otherwise it will don't work, I'll add multi batch by specifying multi values for --image_path later.

HI, i removed --image_path, but second result still empty. [['The image shows a woman sitting on a sandy beach with a dog. The dog is wearing a colorful harness and is sitting on its hind legs, giving a high-five to the woman. The woman is wearing a plaid shirt and is smiling. The'], ['']]

Can you get the correct results there?

NVIDIA / TensorRT-LLM

Qwen2-VL Batch Bug #2495