coltonstearns / dynamic-gaussian-marbles

MIT License
98 stars 9 forks source link

Training with In-the-Wild Videos #6

Open sauradip opened 2 days ago

sauradip commented 2 days ago

Hi ,

Thanks for the awesome work ! i am curious to know a few things :

a) how your code estimates the camera parameters for in the wild videos which does not have any camera information b) How do you lift the trajectory to 3D ? Are you using metric depth to lift ? if so is it not inaccurate ?

coltonstearns commented 1 day ago

Hello, thanks for the questions!

a) In this release of the code, we assume a simple pinhole camera model - a single focal length for both fx and fy, the principle point (cx, cy) as the center of the image, and no skew or distortion. By default, we assume an 80 degree FOV, for which we then compute an appropriate focal length. The code for this is in lines 47-57 of preprocess/01_format_directory.py. For camera extrinsics, we assume the camera is stationary and looking forward, and we learn dynamics for the background to account for camera motion.

b) We use monocular metric depth from DepthAnythingV2. It is usually impressively accurate, although at times exhibits errors (which can cause our method to give a bad reconstruction). Also, instead of directly lifting trajectories to 3D, we initialize per-frame and progressively expand them during our optimization - while a slower optimization, we found this is a bit more robust to depth and tracking errors.

sauradip commented 1 day ago

Thanks for your detailed response ! Just a query, is it possible to optimize the trajectory using a sliding window instead of frame by frame using a fixed camera trajectory as you mentioned in (a) ?