IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.19k stars 651 forks source link

Text encoder API #45

Open WangYixuan12 opened 1 year ago

WangYixuan12 commented 1 year ago

Hi,

Thank you for your awesome work! I wonder whether there is an API for text encoder that takes in texts and outputs text features, like https://huggingface.co/docs/transformers/model_doc/owlvit#transformers.OwlViTTextModel in OWL-ViT.

Best, Yixuan

SlongLiu commented 1 year ago

We have no text encoder API for now. We will improve it later.

You may see the forward function, where the first step is to extract text features: https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/models/GroundingDINO/groundingdino.py#L263

WangYixuan12 commented 1 year ago

Thank you for your reply! Just to double-check, encoded_text is the text feature used, right?