IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.5k stars 667 forks source link

Some sub-words are ignored when use long word #54

Open weixuansun opened 1 year ago

weixuansun commented 1 year ago

Thanks for the great code. I encountered an issue when using the GroundDINO (or maybe it is just expected?) If I use a long word, like 'pottedplant', it will be tokenized into several sub-words. when generating the output bounding boxes, some sub-words are ignored (I guess it is because the cross-attention is done in token level so scores of some sub-words are lower than the text threshold), and generated label is incomplete. For example, the 'pottedplant' -> 'pot' 'ted' 'pl' 'ant', and some box labels are wrong, like 'potted' , 'pottedpl'. I wonder is there any solution for this?

weixuansun commented 1 year ago

image

confusedgreenhand commented 1 year ago

hi, did you fix this bug?

liuhuiCNN commented 1 year ago

I have same issue.