IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
15.29k stars 1.41k forks source link

Erroneous tokenization from Returned prediction phrases in zero-shot detection #339

Open charismaticchiu opened 1 year ago

charismaticchiu commented 1 year ago

2 quick questions about text prompt for zero-shot detection. I used
boxes_filt, pred_phrases, scores = get_grounding_output(model, image, text_prompt, box_threshold, text_threshold, with_logits=False, cpu_only=args.cpu_only)

with text_prompt, say, melon . zucchini . apple . approcot . japanese_fruit And GDINO would often return compound phrases like melon zucchini instead of individually.

Another issue is the underscored phrase such as japanese_fruit would have weird returned tokenization, become separate returned phrases like japanese _ , fruit, Which is undesirable.

Are there ways to fix these issues? Thank you!

rentainhe commented 1 year ago

2 quick questions about text prompt for zero-shot detection. I used boxes_filt, pred_phrases, scores = get_grounding_output(model, image, text_prompt, box_threshold, text_threshold, with_logits=False, cpu_only=args.cpu_only)

with text_prompt, say, melon . zucchini . apple . approcot . japanese_fruit And GDINO would often return compound phrases like melon zucchini instead of individually.

Another issue is the underscored phrase such as japanese_fruit would have weird returned tokenization, become separate returned phrases like japanese _ , fruit, Which is undesirable.

Are there ways to fix these issues? Thank you!

This is because GroundingDINO will combine the words with similarity > text threshold as the phrase for the specific object

I thinks this may not be the practical way for application, we will try to modify the post process of GD for better usage, more details can also be found in: https://github.com/IDEA-Research/GroundingDINO#arrow_forward-demo

charismaticchiu commented 1 year ago

Ok, thank you. Please keep me posted!

rentainhe commented 1 year ago

Ok, thank you. Please keep me posted!

You're welcome, you can try to use the token span args instead of this situation for detecting specific text prompts without needing to modify the post process, I've already shared the link before~ @charismaticchiu