YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
BSD 2-Clause "Simplified" License
1.3k stars 94 forks source link

Running on lower resolution images. #130

Open thucz opened 2 months ago

thucz commented 2 months ago

Hi! I have 640x360 images. I wonder if I can use the model directly on these lower resolution images without resizing to 616x1064.

JUGGHM commented 1 month ago

Hi! I have 640x360 images. I wonder if I can use the model directly on these lower resolution images without resizing to 616x1064.

We cannot because the original ViT models do not support multiple input resolutions due to positional embeddings. All images should be resized/cropped to 616x1064 before feeding into the network.

There are some works exploring this like Patch n' Pack. However, we did not apply such techniques when training metric3d ViT models.

bhack commented 1 month ago

Yes there are many interesting solution like ViTAR