Open kszpxxzmc opened 2 weeks ago
Hi @kszpxxzmc,
Thanks for the feedback. As far as I see, most of the latency is due to initialization of the dynamic mask (here). Since this process runs on CPU, it could vary greatly across hardware. One simple way is to turn off the flow loss for the optimization (by adding --flow_loss_weight=0.0
) though it may degrade the performance. You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks
.
I also noticed that the latency of feed-forward inference (5:37 for 890 pairs) is unusual. Based on my experience (and reports from other users, e.g., https://github.com/Junyi42/monst3r/issues/10#issue-2603031409), this should be done in less than one minute. You could probably try to set larger batchsize in the demo.py
. Hope this helps!
Best.
thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way?
thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way?
Hi @huddyyeo,
Because SAM2 requires a prompt as input (point / box / mask), and we use our initialized mask as the prompt for SAM2 to refine. You could definitely use "click" to get the SAM2 mask for initialization, though this will not be a fully automated way. Another possible way is to use off-the-shelf motion segmentation method (e.g., https://github.com/TonyLianLong/RCF-UnsupVideoSeg) to get the initialized mask, or even use it as the prompt for SAM2.
Thanks.
thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to self.dynamic_masks
in here? since we cannot just init the mask via sam2
You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.
thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to
self.dynamic_masks
in here? since we cannot just init the mask via sam2You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.
Hi @huddyyeo,
Sorry for the confusion. What I meant is that if one already has a better motion segmentation mask (via "click" for SAM2 or off-the-shelf motion segmentation methods), then you can load the segmentation mask with variable self.dynamic_masks
. Thanks.
Thanks for your nice work! I have a confusion regarding inference speed. In your paper, you claim that the inference time of Monst3r on the A6000 is about 90 seconds. I conducted a practical test with 94 images on the A100 and found that the inference time for the whole process is more than 1 hour. I want to know why it is so slow and how I can improve inference speed, even by sacrificing some video memory.