QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
1.89k stars 108 forks source link

Qwen2-VL agent 时候支持视频作为输入 #113

Open Cherryjingyao opened 1 week ago

Cherryjingyao commented 1 week ago
llm_cfg = {
    # Use the model service provided by DashScope:
    'model': 'qwen-vl-max-0809',
    #'api_key': 'YOUR_DASHSCOPE_API_KEY',
    # It will use the `DASHSCOPE_API_KEY' environment variable if 'api_key' is not set here.

    # Use a model service compatible with the OpenAI API, such as vLLM or Ollama:
    # 'model': 'Qwen2-VL-7B-Instruct',
    # 'model_server': 'http://localhost:8000/v1',  # base_url, also known as api_base
    # 'api_key': 'EMPTY',

    # (Optional) LLM hyperparameters for generation:
    'generate_cfg': {
        'top_p': 0.2
    }
}
bot = Assistant(llm=llm_cfg,
                system_message=system_instruction,
                function_list=tools)

messages.append({'role': 'user',
                         'content': [
                             {
                                 'video':'/pfs-data/code/Success_VQA/test_demo/Qwen2-VL/girl_dark_reading.mp4'
                             },
                             {
                                 'text':query
                             },
                        ]})

Traceback (most recent call last): File "/pfs-data/code/Success_VQA/test_demo/Qwen2-VL/agent_demo.py", line 99, in for response in bot.run(messages=messages): File "/data/anaconda3/envs/llava/lib/python3.10/site-packages/qwen_agent/agent.py", line 83, in run new_messages.append(Message(**msg)) File "/data/anaconda3/envs/llava/lib/python3.10/site-packages/qwen_agent/llm/schema.py", line 114, in init super().init(role=role, content=content, name=name, function_call=function_call) File "/data/anaconda3/envs/llava/lib/python3.10/site-packages/pydantic/main.py", line 193, in init self.__pydantic_validator.validate_python(data, self_instance=self) TypeError: ContentItem.init() got an unexpected keyword argument 'video' Exception ignored in: <function CodeInterpreter.del at 0x7ff27a763490> Traceback (most recent call last): File "/data/anaconda3/envs/llava/lib/python3.10/site-packages/qwen_agent/tools/code_interpreter.py", line 124, in del__ TypeError: 'NoneType' object is not callable 调用agent,把图像换为video出错,想请教一下是否支持视频作为输入,如果支持调用样例是什么。

gewenbin0992 commented 1 week ago

抱歉,目前不支持视频作为输入。建议抽取视频帧后使用多个图像输入,避免序列过长,建议抽帧FPS=2。

zhanghanduo commented 1 week ago

那请问我用vllm做server, curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen2-VL-7B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }' 我该如何输入多个图像输入表示视频呢? 貌似只支持image_url,不支持imagevideo字段

ShuaiBai623 commented 1 week ago

改成 "type": "video", "video": "xxx" 就行

https://github.com/QwenLM/Qwen2-VL?tab=readme-ov-file#deployment

zhanghanduo commented 1 week ago

改成 "type": "video", "video": "xxx" 就行

https://github.com/QwenLM/Qwen2-VL?tab=readme-ov-file#deployment

我就是用这个部署的,然后测试,发现错误 {"object":"error","message":"Unknown part type: video","type":"BadRequestError","param":null,"code":400}

我发现vllm的entrypoitn里面根本就没有video这个type https://github.com/fyabc/vllm/blob/add_qwen2_vl_new/vllm/entrypoints/chat_utils.py image