Why can't use SAM encoder to get extracted feature?

aim-uofa / Matcher

[ICLR'24] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

https://arxiv.org/abs/2305.13310

Other

420 stars 25 forks source link

Why can't use SAM encoder to get extracted feature? #5

Open ruizhaoz opened 1 year ago

ruizhaoz commented 1 year ago

Have you try directly use SAM encoder to extract feature instead use other pretrained model?

yangliu96 commented 1 year ago

The features extracted using SAM achieve only around 20 mIoU on fold 0 of COCO-20i. The SAM encoder with weak semantics performs poorly in complex scenes. Here are two reasons for this:

Poor feature matching: SAM's features fail to match multiple instances with similar semantics in complex scenes.
Poor semantic guidance: SAM cannot provide effective semantic guidance for ILM (Instance-Level Matching) to select high-quality mask proposals.

fjchange commented 6 months ago

Dinov2 has great ability in instance retrieval / dense matching. The backbone of SAM is pretrained via MAE, whose feature is not that discriminative.