Closed yajieC closed 10 months ago
Hi @yajieC ,
Thanks for your interest. The original SAM uses an iou token for IoU score prediction. However, after exploring the IoU score with some examples, it seems that the IoU score cannot represent the confidence of the corresponding mask. So, in Open-Vocabulary SAM, we neglect the IoU token for simplicity. In the paper, the IoU token is described in the preliminary part, which states the original SAM.
Please refer to:
for the implementation details.
Thanks for your reply, I have one more question. As you mentioned, iou score prediction cannot represent the confidence of the mask. Have you used iou prediction in OVSAM and then found that its performance is not very well?
In the previous code version, we tried to use OVSAM with the IoU token. However, we did not add an IoU loss on the prediction. So, it is unlikely that there will be a better IoU result. If you want to have a try, you can add an IoU loss on the prediction like IoUNet.
Thank you for your reply. I still have doubts about "we tried to use OVSAM with the iou token, but did not add an iou loss on the prediction". If there is no additional iou loss on prediction, how do you use Iou token directly in OVSAM?
It is inherited from the original SAM as the mask token.
Thank you for your patient reply.
Hello, I wonder how to predict iou in ovsam?
The paper states that there are three tokens, including iou、label、mask tokens, but the weights of iou_token is not found in the model('clip2sam_coco_rn50x16.pth'). There are only two token(mask 、label token) weights in the model('clip2sam_coco_rn50x16.pth'). Besides, by comparing the code of sam decoder, I found that you replaced the original iou_token position with label_token. When I obtain the iou_token, how do I predict the iou in the code?
My questions are as follows: