Segmenting Video Processing and Manually Setting Camera Intrinsics

Hi Junyi!

Firstly, thank you for your outstanding work on Monst3R! I encountered an issue due to GPU memory limitations when processing videos with a larger number of frames. Currently, Monst3R can only run successfully with 65 frames, requiring approximately 33GB of VRAM. When attempting to process videos with a higher frame count, I experience memory overflow.

To work around this, I was considering processing the video in segments. However, I am concerned that independently processing different segments could lead to misalignment between the estimated outputs across segments.

To mitigate this, I am wondering if there is a way to manually specify and fix the camera intrinsics (for example, using camera intrinsics obtained from COLMAP) across different segments. This would help ensure consistency and alignment between the outputs for the entire video, regardless of segment size.

Could you provide any guidance on how to implement this or whether this feature is supported?

Thank you for your time, and I appreciate any insights you can offer.

Junyi42 / monst3r

Segmenting Video Processing and Manually Setting Camera Intrinsics #22