QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
2.61k stars 148 forks source link

[Question] What is the minimum size of an image that can be classified? #233

Open ChangGiMoon opened 3 weeks ago

ChangGiMoon commented 3 weeks ago

I want to use Qwen2-7B to do binary classification (e.g. yes or no) on very small images. Specifically, I am going to use the cropped bounding box patch (which is the result of Object Detection) as input for Qwen2-7B and verify whether the class of bounding box is correct. Can you tell me the approximate minimum image size that can be classified? The prompt will use the following input. The input images have various sizes and appearances as shown below.

prompt: Based on the given image, answer the following question with 'yes' or 'no': Question: [Is there a person in this image?], Answer:

input image example: ex

ShuaiBai623 commented 3 weeks ago

The model supports a minimum pixel size where the shortest side is greater than 28, but the classification accuracy for such small images may be unstable at low resolutions (<224224). You may try to set larger min_pixels, such as 224224 or larger.