Difference between this work and Grounded SAM

Leong1230 commented 1 year ago

Hi, could you please explain the difference between this work and Grounded SAM (https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main)?

Jiaqi-Chen-00 commented 1 year ago

Thanks. Grounded SAM is a great open source repo! It utilizes a detector to identify the objects and precisely segments them using the SAM model. However, in our opinion, Grounded SAM would be limited by the bounding box recommended from the detector, which only focuses on the most prominent subject and fails to discover other objects in the image. Even when combined with BLIP, the resulting caption still primarily focuses on the main objects.

In contrast, SSA first segments the image and then assigns category labels to all masks obtained by SAM. SSA's ability to label all the masks makes it a valuable tool for automatic dataset labeling, significantly reducing manpower costs.

Notably, from a model architecture perspective, Grounded SAM and SSA are two different paradigms: Grounded SAM detects and classifies before segmentation, while SSA segments before classification.

Leong1230 commented 1 year ago

That make sense. Thank you!

fudan-zvg / Semantic-Segment-Anything

Difference between this work and Grounded SAM #3