Another solution from text to mask

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Apache License 2.0

14.12k stars 1.31k forks source link

Your work is really awesome! For your information, there is another solution, which requires CLIP only without any training or extra supervisions.

Our work can achieve text to mask with SAM: https://github.com/xmed-lab/CLIP_Surgery This is work is in the aspect of CLIP's explainability. It's able to guide SAM to achieve text to mask without manual points. Besides, it enhances many open-vocabulary tasks, like segmentation, multi-label classification, multimodal visualization.

This is the jupyter demo: https://github.com/xmed-lab/CLIP_Surgery/blob/master/demo.ipynb

This is our segmentaion results:

This is our heatmap:

Excellent Work! We will highlight it in our README!

IDEA-Research / Grounded-Segment-Anything

Another solution from text to mask #137