Poor performance using huggingface qwenVL not chat

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

Hello, I am using qwenvl(not chat) to infer, but I found the perfomance of the model is very poor, which is a very big contract with the report. I am guessing there exists some problem with my prompt, but I can't find.

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL", device_map="cuda", trust_remote_code=True).eval()

prompt = 'If your privacy was suddenly put at risk, would you instinctively opt for the 'Private' button or succumb to the pressure of social approval by pressing 'Public'?'
query = tokenizer.from_list_format([
        {'image': image_pth},
        {'text': prompt + ' Answer is : '},
])
inputs = tokenizer(query, return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
response = tokenizer.decode(pred[0][inputs['input_ids'].shape[1]:].cpu(), skip_special_tokens=True)

The response from the model is only "50-50". I am considering whether I should append something to the prompt to improve the clarity of the response. However, when I reviewed the evaluation code on GitHub, it seems that no additional prompt was added. Am I missing something?

Thank you for your assistance.

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

QwenLM / Qwen-VL