IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.79k stars 689 forks source link

Multi-object caption has negative effect on detection results. #330

Open hotelll opened 6 months ago

hotelll commented 6 months ago

I am using GroundingDINO to detect object from image. However, I found that an object can be found with caption "ping pong.", but cannot be found with caption "man. ping pong.". The results are as follows:

  1. caption: "ping pong" box_threshold=0.3 image

  2. caption: "man. ping pong." box_threshold=0.3 image

  3. caption: "man. ping pong." box_threshold=0.2 image

I wonder why this happened, and how to solve/ease this issue? Thanks!