AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
3.91k stars 378 forks source link

How can a model trained with prompts train be adapted for online inference? #349

Open LiuChuanWei opened 1 month ago

LiuChuanWei commented 1 month ago

Impressive work. I have a naive question for assistance. I utilized a config file with the model structure SimpleYOLOWorldDetector and trained a model using prompts train, capable of offline inference. However, I aim to perform online inference with this trained model, where actual input prompts are embedded via CLIP for input, rather than using pre-embedded npy files as prompts. My approach involves using the trained model from prompts train with a config file for the YOLOWorldDetector structure for inference, but this leads to issues with missing text_model parameters:

IMG_20240524_142111

How can I implement online inference using the trained model from prompts train, or how can I add text_model-related parameters to the trained SimpleYOLOWorldDetector model to convert it into a YOLOWorldDetector model that supports online inference?

wondervictor commented 1 month ago

Hi @LiuChuanWei, if you fine-tuning the prompts, you'd better use SimpleYOLOWorldDetector (without the text model/encoder) for inference (if you do not forward new categories). If you use YOLOWorldDetector (with the text encoder), you need to load the text model as the configs in configs / pretrain, and the updated/new prompt embeddings can be concatenated with the embeddings from the text encoder for inference.