In your paper, it is written that "Therefore, we utilize SAM to refine the masks predicted by XMem when its quality assessment is not satisfactory. Specifically, we project the probes and affinities to be point prompts for SAM, and the predicted mask from Step 2 is used as a mask prompt for SAM. Then, with these prompts, SAM is able to produce a refined segmentation mask."
My question is: does the quality assessment done automatically or by human efforts? Because I haven't seen the usage of sam_refinement in function vos_tracking_video.
Thank you for your open-source code and wish your reply!
In your paper, it is written that "Therefore, we utilize SAM to refine the masks predicted by XMem when its quality assessment is not satisfactory. Specifically, we project the probes and affinities to be point prompts for SAM, and the predicted mask from Step 2 is used as a mask prompt for SAM. Then, with these prompts, SAM is able to produce a refined segmentation mask."
My question is: does the quality assessment done automatically or by human efforts? Because I haven't seen the usage of sam_refinement in function vos_tracking_video.
Thank you for your open-source code and wish your reply!