How to understand make_baseline_1 in multi-dataset training

Trainingzy commented 3 months ago

Hi, thanks for your great work!

I have a question regarding make_baseline_1 and apply_bounds_shim. In your implementation, you make the baseline between two context views as 1 and adjust the near and far with apply_bounds_shim.

If understand it correctly, this will convert the camera translation to relative translation. It seems works well in a single dataset. However, if I use multiple dataset of different scenes, such as using re10k and co3d together. Then should I make the baseline 1 or specify near and far for each dataset?

Thank you!

dcharatan commented 3 months ago

Those functions are designed for Real Estate 10k, where there are no official near and far planes. They're mainly intended to make sure that the translations/distances have a numerically reasonable scale, i.e., that they're not really large or really small relative to float32 values. If you have a dataset like CO3D where known near and far planes exist, you could use those instead. That should make it easier for the model to learn, since more of the density distribution is centered around the correct depth (just make sure you're not accidentally clipping the scene to be too small). On the other hand, the model should still work if you keep make_baseline_1 and apply_bounds_shim.

If you're interested in CO3D in particular, I would recommend contacting @chrixtar, who used pixelSplat trained on CO3D as a baseline in this paper (see Fig. 3). It's possible that there are more tricks you need for CO3D that I'm not aware of.

Trainingzy commented 3 months ago

Thank you so much for answering and recommending paper!

dcharatan / pixelsplat

How to understand make_baseline_1 in multi-dataset training #75