Open omaraflak opened 2 months ago
Hi, do you solve the problem yet? I have exactly the same problem when using Qwen2VLModel
No, I haven't.
By looking at the source code of Qwen2VLForConditionalGeneration (https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1690-L1706), it looks like the pixel_values (and image_grid_thw) from Qwen2VLProcessor should be first transformed into inputs_embeds, using Qwen2VisionTransformerPretrainedModel, and then inputs_embeds get passed to Qwen2VLModel
But I haven't figured out the particulars yet.
Hi, thank you for your work!
I'd like to add a regression head on top of the model that outputs the hidden states. From the doc I see:
So that's what I need. However, when I try to pass the encoded image and text (from
Qwen2VLProcessor
) to the model I get an error thatpixel_values
is not supported. Indeed, the arguments of theforward
method ofQwen2VLModel
only containsinput_ids
.How do I pass the result of
Qwen2VLProcessor
toQwen2VLModel
?Thanks