Open thucz opened 2 months ago
Hi! I have 640x360 images. I wonder if I can use the model directly on these lower resolution images without resizing to 616x1064.
We cannot because the original ViT models do not support multiple input resolutions due to positional embeddings. All images should be resized/cropped to 616x1064 before feeding into the network.
There are some works exploring this like Patch n' Pack. However, we did not apply such techniques when training metric3d ViT models.
Hi! I have 640x360 images. I wonder if I can use the model directly on these lower resolution images without resizing to 616x1064.