OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
https://internvl.readthedocs.io/en/latest/
MIT License
5.63k stars 439 forks source link

Question: What is the minimum size of an image that can be classified? #594

Open ChangGiMoon opened 1 week ago

ChangGiMoon commented 1 week ago

I want to use InternVL2-8B to do binary classification (e.g. yes or no) on very small images. Specifically, I am going to use the cropped bounding box patch (which is the result of Object Detection) as input for InternVL2-8B and verify whether the class of bounding box is correct. Can you tell me the approximate minimum image size that can be classified? The prompt will use the following input. The input images have various sizes and appearances as shown below.

prompt: Based on the given image, answer the following question with 'yes' or 'no': Question: [Is there a person in this image?], Answer:

input image example: ex

qishisuren123 commented 1 week ago

The minimum image size should be equal to or greater than the patch size, which is 14 by default.