Closed Hangsiin closed 2 months ago
cc @eduardopach
I'm really excited to see grounding DINO integrated into transformers! I've followed the tutorial link below and tried it on my local machine, but how do I do detection for multiple classes?
I saw in the official documentation to do something like "a cat. a remote controller.", but when I tried it, it didn't detect for cat and remote controller individually, but combined them together.
Am I missing something?
TL;DR if you're using the pipeline pass the labels as a list of text (a tip is to add the .
at the end as it is expected by the model) and if you're using the model+processor combo use the all in one text way
Hey! So if you're using the ZeroShotObjectDetectionPipeline
you have to provide the labels through the candidate_labels
argument which is expected to be a list of strings then you should use ["a cat.", "a remote control."]
If you're using GroundingDinoForObjectDetection
with GroundingDInoProcessor
then yeah the way to go is with "a cat. a remote control."
.
You may ask, why this weird difference?
and the answer is:
ZeroShotObjectDetectionPipeline
is older than GroundingDino
and was probably designed to work with the zero shot models that came first.ZeroShotObjectDetectionPipeline
loop through the candidate_labels
when doing the inference and post process with the image processor post_process_object_detection
and the only reason why GroundingDino
is compatible is because of the loop since the post_process_object_detection
from GroundingDinoImageProcessor
doesn't return the actual text label.GroundingDinoProcessor
post_process_grounded_object_detection
and format your text as "label1. label2. label3. ...."
as GroundingDino
work with sub-sentence level text input.@EduardoPach Thanks! I'll try them asap
I'm really excited to see grounding DINO integrated into transformers! I've followed the tutorial link below and tried it on my local machine, but how do I do detection for multiple classes?
(https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Grounding%20DINO/Inference_with_Grounding_DINO_for_zero_shot_object_detection.ipynb)
I saw in the official documentation to do something like "a cat. a remote controller.", but when I tried it, it didn't detect for cat and remote controller individually, but combined them together.
Am I missing something?