btsmart / splatt3r

Official repository for Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Other
514 stars 20 forks source link

Input resolution #21

Open nautilus-a opened 1 month ago

nautilus-a commented 1 month ago

Firstly, thank you for sharing this nice work and its implementation.

I have tried to use this code with the official pre-trained weight to infer the KITTI and Waymo images.

However, I found that the inference results are weird when the input images' sizes are not square.

How can I solve this issue?

The examples below are of KITTI and Waymo. (square inputs makes good results)

KITTI results (512x512 croped two inputs)

square_kitti.webm

KITTI results (1696x512 two inputs)

https://github.com/user-attachments/assets/64924618-0eef-4f73-83b0-ad733869b8dc

Waymo results (512x512 croped two inputs)

https://github.com/user-attachments/assets/79c2b551-a5b5-4db3-8448-00c28cf85630

Waymo results (768x512 two inputs)

https://github.com/user-attachments/assets/178f2378-48d8-403f-9c9c-d12622cd7867

btsmart commented 1 month ago

Hello,

During our training we only use square, 512x512 images, so I am unsure how well it will generalize to other resolutions and aspect ratios. The generated point clouds should still be similar to those created by MASt3R though, as we use a mostly unmodified MASt3R model which has been trained with different aspect ratios. If you are using the Gradio demo, there is some code which rescales/crops the images that you might want to check is working correctly for your samples. Otherwise you may need to finetune a version of the model using different aspect ratios