I have a question about your model. I want to predict images with a resolution 3840x2160 using the model DINO-4scale with SwinL as backbone for a object detection task. I used the weights (_checkpoint0029_4scaleswin.pth) you uploaded from COCO dataset.
However, I am getting really poor results, almost there is no good prediction with that image resolution. Why is that? Shouldn't be the model scalable in image size? Or is it because the model is trained with an image size of 2000 pixels at most and I have to train the model for larger image resolutions?
I have a question about your model. I want to predict images with a resolution 3840x2160 using the model DINO-4scale with SwinL as backbone for a object detection task. I used the weights (_checkpoint0029_4scaleswin.pth) you uploaded from COCO dataset.
However, I am getting really poor results, almost there is no good prediction with that image resolution. Why is that? Shouldn't be the model scalable in image size? Or is it because the model is trained with an image size of 2000 pixels at most and I have to train the model for larger image resolutions?