Open nisyad-ms opened 6 months ago
Hi @nisyad-ms, modifying configs is easy for training, however, it does not support varying resolutions during inference. To this end, you can try to write a simple "data loading" and replace the default test_pipeline
such as
def preprocess(image, size=(640, 640)):
h, w = image.shape[:2]
max_size = max(h, w)
scale_factor = size[0] / max_size
pad_h = (max_size - h) // 2
pad_w = (max_size - w) // 2
pad_image = np.zeros((max_size, max_size, 3), dtype=image.dtype)
pad_image[pad_h:h + pad_h, pad_w:w + pad_w] = image
image = cv2.resize(pad_image, size,
interpolation=cv2.INTER_LINEAR).astype('float32')
image /= 255.0
image = image[None]
return image, scale_factor, (pad_h, pad_w)
And feed the images into the YOLO-World model as image input.
Thanks @wondervictor !
@wondervictor - In your opinion, can using an input image of resolution 1280 at inference with the XL model improve the performance (I am benchmarking some datasets)? If I am not wrong the XL model currently has input resolution of 640 at inference?
Hi @nisyad-ms, using "X-1280" is more suggested considering the accuracy. However, larger models with high-resolution inputs increases the inference latency.
I noticed that all the images get pre-processed to the img resolution of the config or inherited base config.
Is there an easier way to test images at different resolutions? (compared to directly manipulating base configs)
Thanks