NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
8.42k stars 1.32k forks source link

Grounding Dino - Multiple Class #410

Closed Hangsiin closed 2 months ago

Hangsiin commented 2 months ago

I'm really excited to see grounding DINO integrated into transformers! I've followed the tutorial link below and tried it on my local machine, but how do I do detection for multiple classes?

(https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Grounding%20DINO/Inference_with_Grounding_DINO_for_zero_shot_object_detection.ipynb)

I saw in the official documentation to do something like "a cat. a remote controller.", but when I tried it, it didn't detect for cat and remote controller individually, but combined them together.

Am I missing something?

NielsRogge commented 2 months ago

cc @eduardopach

EduardoPach commented 2 months ago

I'm really excited to see grounding DINO integrated into transformers! I've followed the tutorial link below and tried it on my local machine, but how do I do detection for multiple classes?

(https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Grounding%20DINO/Inference_with_Grounding_DINO_for_zero_shot_object_detection.ipynb)

I saw in the official documentation to do something like "a cat. a remote controller.", but when I tried it, it didn't detect for cat and remote controller individually, but combined them together.

Am I missing something?

TL;DR if you're using the pipeline pass the labels as a list of text (a tip is to add the . at the end as it is expected by the model) and if you're using the model+processor combo use the all in one text way

Hey! So if you're using the ZeroShotObjectDetectionPipeline you have to provide the labels through the candidate_labels argument which is expected to be a list of strings then you should use ["a cat.", "a remote control."] If you're using GroundingDinoForObjectDetection with GroundingDInoProcessor then yeah the way to go is with "a cat. a remote control.".

You may ask, why this weird difference? and the answer is:

Hangsiin commented 2 months ago

@EduardoPach Thanks! I'll try them asap