IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
15.16k stars 1.4k forks source link

when text prompt is long, prompt is cut off? #10

Open yxchng opened 1 year ago

yxchng commented 1 year ago

chair_behind_blue_chair

I run the demo using the command python grounded_sam_demo.py --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo3.jpg --output_dir "outputs" --box_threshold 0.3 --text_threshold 0.25 --text_prompt "chair behind blue chair" --device "cpu"

However, the text on the output is chair

SlongLiu commented 1 year ago

Thanks for the issue. The model output boxes with corresponding noun in the sentence. In the example, the described words may be ignored.

Necolizer commented 1 year ago

Same issue for me. Using descriptions like object in hand would get both object and hand results. Despite this limitation, this work is great and easy-to-implement. Thanks to the authors' excellent work :)