AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.62k stars 447 forks source link

Change image resolution for Inference? #287

Open nisyad-ms opened 6 months ago

nisyad-ms commented 6 months ago

I noticed that all the images get pre-processed to the img resolution of the config or inherited base config.

Is there an easier way to test images at different resolutions? (compared to directly manipulating base configs)

Thanks

wondervictor commented 6 months ago

Hi @nisyad-ms, modifying configs is easy for training, however, it does not support varying resolutions during inference. To this end, you can try to write a simple "data loading" and replace the default test_pipeline such as

def preprocess(image, size=(640, 640)):
    h, w = image.shape[:2]
    max_size = max(h, w)
    scale_factor = size[0] / max_size
    pad_h = (max_size - h) // 2
    pad_w = (max_size - w) // 2
    pad_image = np.zeros((max_size, max_size, 3), dtype=image.dtype)
    pad_image[pad_h:h + pad_h, pad_w:w + pad_w] = image
    image = cv2.resize(pad_image, size,
                       interpolation=cv2.INTER_LINEAR).astype('float32')
    image /= 255.0
    image = image[None]
    return image, scale_factor, (pad_h, pad_w)

And feed the images into the YOLO-World model as image input.

nisyad-ms commented 6 months ago

Thanks @wondervictor !

nisyad-ms commented 6 months ago

@wondervictor - In your opinion, can using an input image of resolution 1280 at inference with the XL model improve the performance (I am benchmarking some datasets)? If I am not wrong the XL model currently has input resolution of 640 at inference?

wondervictor commented 6 months ago

Hi @nisyad-ms, using "X-1280" is more suggested considering the accuracy. However, larger models with high-resolution inputs increases the inference latency.