Some sub-words are ignored when use long word

weixuansun commented 1 year ago

Thanks for the great code. I encountered an issue when using the GroundDINO (or maybe it is just expected?) If I use a long word, like 'pottedplant', it will be tokenized into several sub-words. when generating the output bounding boxes, some sub-words are ignored (I guess it is because the cross-attention is done in token level so scores of some sub-words are lower than the text threshold), and generated label is incomplete. For example, the 'pottedplant' -> 'pot' 'ted' 'pl' 'ant', and some box labels are wrong, like 'potted' , 'pottedpl'. I wonder is there any solution for this?

weixuansun commented 1 year ago

confusedgreenhand commented 1 year ago

hi, did you fix this bug?

liuhuiCNN commented 1 year ago

I have same issue.

IDEA-Research / GroundingDINO

Some sub-words are ignored when use long word #54