[Extension Project] Generating Box Prompts with Zero-Shot Detector for Segment-Anything

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

47.63k stars 5.63k forks source link

[Extension Project] Generating Box Prompts with Zero-Shot Detector for Segment-Anything #74

Open rentainhe opened 1 year ago

rentainhe commented 1 year ago

Hi! Thanks for releasing such impressive work! We find an interesting extension for this great work by combining SoTA zero-shot detector with Segment-Anything which can generate high-quality box and mask annotations with text inputs! The new project is here, we simply named it Grounded-Segment-Anything: https://github.com/IDEA-Research/Grounded-Segment-Anything

We take Grounding-DINO as the zero-shot detector to generate box prompts for segment-anything, and our visualization results are as follows:

grounded_sam2

We hope to maintain this project as a sub-project of segment-anything. We're also explore to combing Grouned-SAM with diffusion models for controllable image editing as well~

More Examples

grounded_sam_demo3_demo4

rentainhe commented 1 year ago

Combining Grounded-SAM with Stable-Diffusion Inpainting!

We can further combine Grounded-Segment-Anything with Diffusion Models for Inpainting, which means we can label and generate high quality new data (with box and mask annotation) with this pipeline!

liuwenhaha commented 1 year ago

excellent~

spiderman-spiderman commented 1 year ago

nice~

Eli-YiLi commented 1 year ago

I come again ...

It's an excellent work above via grounding box. While we provide a simpler solution via CLIP's explainability.

Our work can achieve text to mask with SAM using CLIP model only, without any fine-tuning or extra supervisions to generate the boxes:. https://github.com/xmed-lab/CLIP_Surgery

Besides, it enhances many open-vocabulary tasks, like segmentation, multi-label classification, multimodal visualization.

This is the jupyter demo: https://github.com/xmed-lab/CLIP_Surgery/blob/master/demo.ipynb

fig4