IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.83k stars 693 forks source link

Some Empty Labels or Labels with Multiple Classes #354

Open willjhliang opened 3 months ago

willjhliang commented 3 months ago

Using the Huggingface API with AutoProcessor, AutoModelForZeroShotObjectDetection, and post_process_grounded_object_detection() (following Grounded SAM 2), I receive some empty label strings in the results (results[0]["labels"]) or strings that include 2 classes. My classes are specified as

text = ".".join([
    "pot", "pan", ...
])

I have a few questions:

  1. What are these labels and the boxes associated with them? Do the empty ones not meet the post processing threshold?
  2. What is the common practice with dealing with them? Do we simply discard these boxes?
  3. How are multiple classes assigned to the box?

Thank you very much!

YCAyca commented 4 weeks ago

Hello! I have the same issue and would like to learn the reason. I don't want that the model merges my classes and assign class x class y for one box, in this case there is no way to apply nms and choose the higher probability class for the related box since they come as one single label and it creates lower precision issue for my dataset. Thanks

1benwu1 commented 8 hours ago

Hello! I have the same issue and would like to learn the reason. I don't want that the model merges my classes and assign class x class y for one box, in this case there is no way to apply nms and choose the higher probability class for the related box since they come as one single label and it creates lower precision issue for my dataset. Thanks

this happens a lot . sometimes a class with a long name, may be cut off. for example, A: motorbike---> B:motor C:bike