QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
2.33k stars 130 forks source link

use Qwen2VLModel in huggingface got an unexpected keyword argument 'pixel_values' #266

Open mearcstapa-gqz opened 2 days ago

mearcstapa-gqz commented 2 days ago
from PIL import Image
import requests
from transformers import AutoProcessor, Qwen2VLModel

model = Qwen2VLModel.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["a photo of a cat"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)

TypeError: Qwen2VLModel.forward() got an unexpected keyword argument 'pixel_values'

I assume Qwen2VLModel should be used to get hidden states from text and image input, but looks like the normal pipeline fails

mearcstapa-gqz commented 2 days ago

same as https://github.com/QwenLM/Qwen2-VL/issues/166