IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
14.12k stars 1.31k forks source link

Another solution from text to mask #137

Open Eli-YiLi opened 1 year ago

Eli-YiLi commented 1 year ago

Your work is really awesome! For your information, there is another solution, which requires CLIP only without any training or extra supervisions.

Our work can achieve text to mask with SAM: https://github.com/xmed-lab/CLIP_Surgery This is work is in the aspect of CLIP's explainability. It's able to guide SAM to achieve text to mask without manual points. Besides, it enhances many open-vocabulary tasks, like segmentation, multi-label classification, multimodal visualization.

This is the jupyter demo: https://github.com/xmed-lab/CLIP_Surgery/blob/master/demo.ipynb

This is our segmentaion results: image

This is our heatmap: image

rentainhe commented 1 year ago

Your work is really awesome! For your information, there is another solution, which requires CLIP only without any training or extra supervisions.

Our work can achieve text to mask with SAM: https://github.com/xmed-lab/CLIP_Surgery This is work is in the aspect of CLIP's explainability. It's able to guide SAM to achieve text to mask without manual points. Besides, it enhances many open-vocabulary tasks, like segmentation, multi-label classification, multimodal visualization.

This is the jupyter demo: https://github.com/xmed-lab/CLIP_Surgery/blob/master/demo.ipynb

This is our segmentaion results: image

This is our heatmap: image

Excellent Work! We will highlight it in our README!