use Qwen2VLModel in huggingface got an unexpected keyword argument 'pixel_values'

from PIL import Image
import requests
from transformers import AutoProcessor, Qwen2VLModel

model = Qwen2VLModel.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["a photo of a cat"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)

TypeError: Qwen2VLModel.forward() got an unexpected keyword argument 'pixel_values'

I assume Qwen2VLModel should be used to get hidden states from text and image input, but looks like the normal pipeline fails

QwenLM / Qwen2-VL

use Qwen2VLModel in huggingface got an unexpected keyword argument 'pixel_values' #266