I am using GroundingDINO to detect object from image. However, I found that an object can be found with caption "ping pong.", but cannot be found with caption "man. ping pong.". The results are as follows:
caption: "ping pong" box_threshold=0.3
caption: "man. ping pong." box_threshold=0.3
caption: "man. ping pong." box_threshold=0.2
I wonder why this happened, and how to solve/ease this issue? Thanks!
I am using GroundingDINO to detect object from image. However, I found that an object can be found with caption "ping pong.", but cannot be found with caption "man. ping pong.". The results are as follows:
caption: "ping pong" box_threshold=0.3
caption: "man. ping pong." box_threshold=0.3
caption: "man. ping pong." box_threshold=0.2
I wonder why this happened, and how to solve/ease this issue? Thanks!