Fine tuning on a custom dataset

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

https://www.yoloworld.cc

GNU General Public License v3.0

4.27k stars 416 forks source link

Fine tuning on a custom dataset #333

Open MrOCW opened 3 months ago

MrOCW commented 3 months ago

Hi, i'd like to fine tune YOLO-World on a custom dataset In this dataset, I have images, respective captions and bounding boxes matching the caption. e.g. image1 of multiple colored candies. caption: blue crystal candy. bbox = [x,y,w,h] image2 of multiple colored candies. caption: red sweet. bbox = [x,y,w,h] How should I structure my dataset, number of classes etc? Are they simply the number of unique captions in my training dataset?

wondervictor commented 3 months ago

Hi @MrOCW,

You need to organize your dataset into a coco-format, refer to: https://cocodataset.org/#format-data
You can add the caption for each box annotation.
You need to use MixedGroundingDataset to load your dataset.

MrOCW commented 3 months ago

@wondervictor

How about the num_classes, num_training_classes?

based on

coco_train_dataset = dict(
_delete_=True,
type='MultiModalDataset',
dataset=dict(
    type='YOLOv5CocoDataset',
    data_root='data/coco',
    ann_file='annotations/instances_train2017.json',
    data_prefix=dict(img='train2017/'),
    filter_cfg=dict(filter_empty_gt=False, min_size=32)),
class_text_path='data/texts/coco_class_texts.json',
pipeline=train_pipeline)

What should be in class_text_path? Do i create a list of classes that matches the unique captions in my VLM data?

I am supposed to modify the configs in configs/finetune_coco right?

Unicorn123455678 commented 3 months ago

3. MixedGroundingDataset How about the num_classes, num_training_classes?whats the difference of them?I need the answer too

demooooooo0303 commented 3 months ago

Hi @MrOCW,

You need to organize your dataset into a coco-format, refer to: https://cocodataset.org/#format-data

You can add the caption for each box annotation.

You need to use MixedGroundingDataset to load your dataset.

what is MixedGroundingDataset?In finetune config there is type='MultiModalDataset'