When testing the ap of the lvis data set, whether to combine the output of hqsam and the output mask of sam itself to test the ap. Is sam's prediction used to select the box combined with the prediction score on the box? When using vit-det to get the detection frame, is the detection head mask rcnn or cascade rcnn? I use mask rcnn as the detection head, and the ap of hqsam-l is 45.289. The one in the paper is 43.9. Why is my measurement different from the one in the paper?
Thanks.
Hi, we use cascade rcnn with this config.
And for evaluation, we simply use all pred bbox as prompt without combining score or using output mask as another prompt.
When testing the ap of the lvis data set, whether to combine the output of hqsam and the output mask of sam itself to test the ap. Is sam's prediction used to select the box combined with the prediction score on the box? When using vit-det to get the detection frame, is the detection head mask rcnn or cascade rcnn? I use mask rcnn as the detection head, and the ap of hqsam-l is 45.289. The one in the paper is 43.9. Why is my measurement different from the one in the paper? Thanks.