Open charismaticchiu opened 1 year ago
2 quick questions about text prompt for zero-shot detection. I used
boxes_filt, pred_phrases, scores = get_grounding_output(model, image, text_prompt, box_threshold, text_threshold, with_logits=False, cpu_only=args.cpu_only)
with
text_prompt
, say,melon . zucchini . apple . approcot . japanese_fruit
And GDINO would often return compound phrases likemelon zucchini
instead of individually.Another issue is the underscored phrase such as
japanese_fruit
would have weird returned tokenization, become separate returned phrases likejapanese _
,fruit
, Which is undesirable.Are there ways to fix these issues? Thank you!
This is because GroundingDINO will combine the words with similarity > text threshold
as the phrase for the specific object
I thinks this may not be the practical way for application, we will try to modify the post process of GD for better usage, more details can also be found in: https://github.com/IDEA-Research/GroundingDINO#arrow_forward-demo
Ok, thank you. Please keep me posted!
Ok, thank you. Please keep me posted!
You're welcome, you can try to use the token span
args instead of this situation for detecting specific text prompts without needing to modify the post process, I've already shared the link before~ @charismaticchiu
2 quick questions about text prompt for zero-shot detection. I used
boxes_filt, pred_phrases, scores = get_grounding_output(model, image, text_prompt, box_threshold, text_threshold, with_logits=False, cpu_only=args.cpu_only)
with
text_prompt
, say,melon . zucchini . apple . approcot . japanese_fruit
And GDINO would often return compound phrases likemelon zucchini
instead of individually.Another issue is the underscored phrase such as
japanese_fruit
would have weird returned tokenization, become separate returned phrases likejapanese _
,fruit
, Which is undesirable.Are there ways to fix these issues? Thank you!