Multi-object caption has negative effect on detection results.

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

https://arxiv.org/abs/2303.05499

Apache License 2.0

6.79k stars 689 forks source link

Multi-object caption has negative effect on detection results. #330

Open hotelll opened 6 months ago

hotelll commented 6 months ago

I am using GroundingDINO to detect object from image. However, I found that an object can be found with caption "ping pong.", but cannot be found with caption "man. ping pong.". The results are as follows:

caption: "ping pong" box_threshold=0.3
caption: "man. ping pong." box_threshold=0.3
caption: "man. ping pong." box_threshold=0.2

I wonder why this happened, and how to solve/ease this issue? Thanks!