AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.61k stars 448 forks source link

How to set the labels contain multiple noun phrases, like big_car_front. #310

Open HuAndrew opened 5 months ago

HuAndrew commented 5 months ago

Gratitude

Firstly, I would like to express my appreciation for the open-source work on yolo-world. It has been significantly impactful to the industry.

Description

I have a question regarding the labeling process when using yolo-world for fine-tuning or pre-training models. Specifically, I'm dealing with object detection tasks where the labels contain multiple noun phrases, such as "big_car_front" and "car_reg" (assuming "car_reg" might be a shorthand for "car_rear" or "car_registration").

Question

  1. How should I structure my labels when dealing with compound or multi-part object names?
  2. Does yolo-world support custom labels, and if so, how can I integrate them into the training process?

Additional Information

Actual Behavior

[Practice lab: Using the tag "big_car_reg" directly for fine-tuning or pre-training may result in the loss of semantic information of the nouns, which could lead to suboptimal performance.]

Steps to Reproduce (for bugs)

If applicable, please provide steps to reproduce the issue.

Expected Behavior

What I expect to happen is that yolo-world should be able to handle custom labels and allow me to train the model to detect specific parts of objects with those labels.

wondervictor commented 5 months ago

Hi @HuAndrew thanks for your interest in YOLO-World! I'm concerned about whether the noun phrases are fixed types (like categories) or they are different for different cases, for example, there are 10 types of noun phrases (limited) or unlimited types of noun phrases (open).

For the limited version, you can use the normal fine-tuning setting to fine-tune YOLO-World for your applications and you need to prepare your data according to the coco format.

For the unlimited/open version, you need to assign a text to each box annotation (replace the category with a text). I'll update a new dataset class for this case if you need it.

HuAndrew commented 5 months ago

Thank you very much for your prompt response and assistance.

For the Funlimited/open version, we need to assign a text to each box annotation (replacing the category with a text).

We appreciate your help and support.

Best regards.

wondervictor commented 5 months ago

Hi @HuAndrew, you're welcome. I'll update a dataset class or you can try to use MixedGroundingDataset first.