QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Other
5.06k stars 385 forks source link

Extracting Unimodal Features #477

Open sreebhattacharyya opened 1 month ago

sreebhattacharyya commented 1 month ago

Hello! I am trying to use Qwen-VL to extract unimodal features for a given input image and accompanying text query. How can that be achieved? I am aware that models like BLIP-2 have a direct API (extract_features) that aids in doing this. But how can it be achieved in the context of Qwen-VL?

thusinh1969 commented 1 month ago

Exactly what I was about to query. How do we get encoder embedding from Qwen2-VL for text and/or image or image/text combined input --> feature extracted.

Thanks, Steve