QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Other
5.12k stars 386 forks source link

💡 [REQUEST] - Сorrect request for a detailed response without detection #190

Open ditengm opened 12 months ago

ditengm commented 12 months ago

起始日期 | Start Date

12.7.2023

实现PR | Implementation PR

-

相关Issues | Reference Issues

What prompt is needed to ensure that the model does not return detected objects?

摘要 | Summary

I use several promts so that the model simply describes the objects.

What prompt do I need so that the model does not return a detection, but returns a detailed response? MODEL_NAME = '4bit/Qwen-VL-Chat-Int4'

基本示例 | Basic Example

Examples: text_1 = 'You can write only in English. Step by step describe the all objects (environment, emotions, devices and other things) in the image' text_2 = 'You can write only in English. Step by step describe the all (environment, emotions, devices and other things) in the image' text_3 = 'You can write only in English. Step by step describe it' text_4 = 'You can only write in English. Describe everything (environment, emotions, device, etc.) in the image step by step and in detail.'

In all prompts, the model gives detection to 7 photos out of 9. I don’t need this, I just want to get an answer without detection inside.

缺陷 | Drawbacks

-

未解决问题 | Unresolved questions

-

ShuaiBai623 commented 11 months ago

Describe the image in detail. Or just try our new model Qwen-VL-plus in readme.

jinze1994 commented 11 months ago

@ditengm If you don't want any box-like annotations in the response, you can stably get the cleaned text by the following post-processing.

# response = '<ref> Two apples</ref><box>(302,257),(582,671)</box><box>(603,252),(878,642)</box> and<ref> a bowl</ref><box>(2,269),(304,674)</box>'
import re
clean_response = re.sub(r'<ref>(.*?)</ref>(?:<box>.*?</box>)*(?:<quad>.*?</quad>)*', r'\1', response).strip()
print(clean_response)
# clean_response = 'Two apples and a bowl'

refer to https://github.com/QwenLM/Qwen-VL/blob/master/TUTORIAL.md#how-to-get-the-caption-without-any-box-like-annotations