Open zhangshuoneu opened 5 hours ago
Hi @zhangshuoneu,
Thanks for pointing this out!
The row you mentioned is for our ablation study on the fine-tuning datasets. This means using all our optimization techniques but only switching the MonST3R model with the DUSt3R checkpoint. A clearer notation is "No finetune (DUSt3R ckpt)" rather than "(DUSt3R)".
We will update this in our paper.
Best
Thanks!
Table 5 in the paper presents the pose and video estimation accuracy of DUSt3R. I would like to know whether the method mentioned here refers to the result obtained by applying the dynamic pixel region mask mentioned in the paper, or whether it is similar to the Monst3R method, where masking is performed on the output point cloud.