Open wdkkkk opened 6 days ago
Hi @wdkkkk,
Thank you for your interest in our work!
Yes, we handle the foreground and background differently in visualization. For the background, we visualize the overlapping point clouds across the entire sequence to provide a consistent view. For the foreground, we visualize only the point cloud at the corresponding timestamp, so that it changes over time.
The fg/bg masks can be derived from ground truth mask or obtained as the motion mask from our method. Additionally, alternatives like using SAM2 to generate the mask are also feasible options.
Best.
Hi @Junyi42, thanks for your great work! So according to your explanation, all the demo videos are using motion mask from the method? Or part of them are using GT mask?
Hi @Junyi42, thanks for your great work! So according to your explanation, all the demo videos are using motion mask from the method? Or part of them are using GT mask?
Hi @littlepure2333,
Thanks! We use the GT mask for the joint dense reconstruction & pose estimation, for a fair comparison with prior work. The motion mask extracted from our method could occasionally be noisy. The quality of SAM2's mask (that could be prompted from Monst3r's motion mask or simply click) would be similar to the GT mask.
Best
Hi, thanks for sharing your awesome work. I wonder if the shape of Global Point Cloud $\hat{X}$ is $H\times W \times 3 \times T$, and if you only render $\hat{X}_t$ to obtain the rendered frame at timestamp $t$? If that’s the case, I think the background should change at different timestamps based on the camera's visible area (i.e. the invisible areas in the input video frame $t$ will not appear in the rendered result of frame $t$.). But in the demo on your website, the background appears unchanged across timestamps. Therefore, I would like to know if you handled the foreground and background differently?