AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.37k stars 423 forks source link

使用自定义的数据集运行时,每次都是这两个损失为0, loss_bbox: 0.0000 和 loss_dfl:0.0000 #337

Open tm924222 opened 4 months ago

tm924222 commented 4 months ago

base = ('../../third_party/mmyolo/configs/yolov8/' 'yolov8_l_syncbn_fast_8xb16-500e_coco.py') custom_imports = dict(imports=['yolo_world'], allow_failed_imports=False)

hyper-parameters

num_classes = 6 num_training_classes = 6 max_epochs = 80 # Maximum training epochs close_mosaic_epochs = 30 save_epoch_intervals = 5 text_channels = 512 neck_embed_channels = [128, 256, base.last_stage_out_channels // 2] neck_num_heads = [4, 8, base.last_stage_out_channels // 2 // 32] base_lr = 0.001 weight_decay = 0.0005 train_batch_size_per_gpu = 8 load_from = '/home/huangguangxu/yolo/pretrained_models/yolo_world_v2_l_vlpan_bn_sgd_1e-3_40e_8gpus_finetune_coco_ep80-e1288152.pth' text_model_name = '/home/huangguangxu/yolo/clip-vit-base-patch32'

text_model_name = 'openai/clip-vit-base-patch32'

persistent_workers = False

model settings

model = dict(type='YOLOWorldDetector', mm_neck=True, num_train_classes=num_training_classes, num_test_classes=num_classes, data_preprocessor=dict(type='YOLOWDetDataPreprocessor'), backbone=dict(delete=True, type='MultiModalYOLOBackbone', image_model={{base.model.backbone}}, text_model=dict(type='HuggingCLIPLanguageBackbone', model_name=text_model_name, frozen_modules=['all'])), neck=dict(type='YOLOWorldPAFPN', guide_channels=text_channels, embed_channels=neck_embed_channels, num_heads=neck_num_heads, block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv')), bbox_head=dict(type='YOLOWorldHead', head_module=dict( type='YOLOWorldHeadModule', use_bn_head=True, embed_dims=text_channels, num_classes=num_training_classes)), train_cfg=dict(assigner=dict(num_classes=num_training_classes)))

dataset settings

text_transform = [ dict(type='RandomLoadText', num_neg_samples=(num_classes, num_classes), max_num_samples=num_training_classes, padding_to_max=True, padding_value=''), dict(type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts')) ] mosaic_affine_transform = [ dict(type='MultiModalMosaic', img_scale=base.img_scale, pad_val=114.0, pre_transform=base.pre_transform), dict( type='YOLOv5RandomAffine', max_rotate_degree=0.0, max_shear_degree=0.0, max_aspect_ratio=100., scaling_ratio_range=(1 - base.affine_scale, 1 + base.affine_scale),

img_scale is (width, height)

    border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
    border_val=(114, 114, 114))

]

train_pipeline = [ base.pre_transform, mosaic_affine_transform, dict(type='YOLOv5MultiModalMixUp', prob=base.mixup_prob, pre_transform=[base.pre_transform, mosaic_affine_transform]), base.last_transform[:-1], text_transform ] train_pipeline_stage2 = [base.train_pipeline_stage2[:-1], text_transform]

coco_train_dataset = dict(delete=True, type='MultiModalDataset', dataset=dict( type='YOLOv5CocoDataset', metainfo=dict(classes=['Missing_hole','Mouse_bite','Open_circuit','Short','Spur','Spurious_copper']), data_root='/home/huangguangxu/yolo/data/pcb.v1i.coco', ann_file='/home/huangguangxu/yolo/data/pcb.v1i.coco/annotations/instances_train2017.json', data_prefix=dict(img='/home/huangguangxu/yolo/data/pcb.v1i.coco/train2017/'), filter_cfg=dict(filter_empty_gt=False, min_size=32)), class_text_path='/home/huangguangxu/yolo/data/texts/pcb_class_texts.json', pipeline=train_pipeline)

train_dataloader = dict(persistent_workers=persistent_workers, batch_size=train_batch_size_per_gpu, collate_fn=dict(type='yolow_collate'), dataset=coco_train_dataset) test_pipeline = [ *base.test_pipeline[:-1], dict(type='LoadText'), dict(type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'texts')) ] coco_val_dataset = dict( delete=True, type='MultiModalDataset', dataset=dict(type='YOLOv5CocoDataset', data_root='/home/huangguangxu/yolo/data/pcb.v1i.coco', ann_file='/home/huangguangxu/yolo/data/pcb.v1i.coco/annotations/instances_val2017.json', data_prefix=dict(img='/home/huangguangxu/yolo/data/pcb.v1i.coco/valid2017/'), filter_cfg=dict(filter_empty_gt=False, min_size=32)), class_text_path='/home/huangguangxu/yolo/data/texts/pcb_class_texts.json', pipeline=test_pipeline) val_dataloader = dict(dataset=coco_val_dataset) test_dataloader = val_dataloader

training settings

default_hooks = dict(param_scheduler=dict(scheduler_type='linear', lr_factor=0.01, max_epochs=max_epochs), checkpoint=dict(max_keep_ckpts=-1, save_best=None, interval=save_epoch_intervals)) custom_hooks = [ dict(type='EMAHook', ema_type='ExpMomentumEMA', momentum=0.0001, update_buffers=True, strict_load=False, priority=49), dict(type='mmdet.PipelineSwitchHook', switch_epoch=max_epochs - close_mosaic_epochs, switch_pipeline=train_pipeline_stage2) ] train_cfg = dict(max_epochs=max_epochs, val_interval=5, dynamic_intervals=[((max_epochs - close_mosaic_epochs), base.val_interval_stage2)]) optim_wrapper = dict(optimizer=dict( delete=True, type='SGD', lr=base_lr, momentum=0.937, nesterov=True, weight_decay=weight_decay, batch_size_per_gpu=train_batch_size_per_gpu), paramwise_cfg=dict( custom_keys={ 'backbone.text_model': dict(lr_mult=0.01), 'logit_scale': dict(weight_decay=0.0) }), constructor='YOLOWv5OptimizerConstructor')

evaluation settings

val_evaluator = dict(delete=True, type='mmdet.CocoMetric', proposal_nums=(100, 1, 10), ann_file='/home/huangguangxu/yolo/data/pcb.v1i.coco/annotations/instances_val2017.json', metric='bbox')

142e4913cca9e6640dcae61eee5b33a

wondervictor commented 4 months ago

Hi @tm924222, could you provide me a sample annotation file?

tm924222 commented 4 months ago

嗨,你能给我一个示例注释文件吗? ok ![Uploading 6d6eb4e01c7d6b35f8e3572704ef98f.png…]()

tm924222 commented 4 months ago

Hi @tm924222, could you provide me a sample annotation file?

"date_created":"2024-05-21T03:03:48+00:00"},"licenses":[{"id":1,"url":"","name":"Unknown"}],"categories":[{"id":0,"name":"1","supercategory":"none"},{"id":1,"name":"missing_hole","supercategory":"1"},{"id":2,"name":"mouse_bite","supercategory":"1"},{"id":3,"name":"open_circuit","supercategory":"1"},{"id":4,"name":"short","supercategory":"1"},{"id":5,"name":"spur","supercategory":"1"},{"id":6,"name":"spurious_copper","supercategory":"1"}],"images":[{"id":0,"license":1,"file_name":"05_open_circuit_03_jpg.rf.e13d9485402af4c1c00b7552ad6c9f01.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":1,"license":1,"file_name":"01_missing_hole_18_jpg.rf.e200e3f4e219e50d92374113738076e2.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":2,"license":1,"file_name":"11_missing_hole_10_jpg.rf.e6075768b759eef74f0d70219705da33.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":3,"license":1,"file_name":"08_missing_hole_09_jpg.rf.e1934e5d1f7f6018df3389b0794b9bf5.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":4,"license":1,"file_name":"11_missing_hole_09_jpg.rf.e0a8581b82eec73303a5d191aed8b129.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":5,"license":1,"file_name":"06_missing_hole_01_jpg.rf.e31c89a5744cc949516d0b221894836b.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":6,"license":1,"file_name":"05_missing_hole_02_jpg.rf.e04b6d11f195420bd2edc14b64ba7f99.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":7,"license":1,"file_name":"04_short_06_jpg.rf.e11f93bd670a55a875be0720cd002d6a.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},

Unicorn123455678 commented 3 months ago

Hi @tm924222, could you provide me a sample annotation file?嗨,你能给我一个示例注释文件吗?

我也遇到了这个问题,但我的loss_cls一开始很大但一个epoch后就变成0了,loss_box和loss_dfl一直是0 ![Uploading github.png…]()

wondervictor commented 3 months ago

@Unicorn123455678 @tm924222 这种问题通常来自于训练时候,没有物体框标注,要么使用了mask-refine,要么使用的标注存在问题。

wondervictor commented 3 months ago

可以尝试打开filter_empty_gt=True

wondervictor commented 3 months ago

Hi @tm924222, could you provide me a sample annotation file?

"date_created":"2024-05-21T03:03:48+00:00"},"licenses":[{"id":1,"url":"","name":"Unknown"}],"categories":[{"id":0,"name":"1","supercategory":"none"},{"id":1,"name":"missing_hole","supercategory":"1"},{"id":2,"name":"mouse_bite","supercategory":"1"},{"id":3,"name":"open_circuit","supercategory":"1"},{"id":4,"name":"short","supercategory":"1"},{"id":5,"name":"spur","supercategory":"1"},{"id":6,"name":"spurious_copper","supercategory":"1"}],"images":[{"id":0,"license":1,"file_name":"05_open_circuit_03_jpg.rf.e13d9485402af4c1c00b7552ad6c9f01.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":1,"license":1,"file_name":"01_missing_hole_18_jpg.rf.e200e3f4e219e50d92374113738076e2.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":2,"license":1,"file_name":"11_missing_hole_10_jpg.rf.e6075768b759eef74f0d70219705da33.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":3,"license":1,"file_name":"08_missing_hole_09_jpg.rf.e1934e5d1f7f6018df3389b0794b9bf5.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":4,"license":1,"file_name":"11_missing_hole_09_jpg.rf.e0a8581b82eec73303a5d191aed8b129.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":5,"license":1,"file_name":"06_missing_hole_01_jpg.rf.e31c89a5744cc949516d0b221894836b.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":6,"license":1,"file_name":"05_missing_hole_02_jpg.rf.e04b6d11f195420bd2edc14b64ba7f99.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},{"id":7,"license":1,"file_name":"04_short_06_jpg.rf.e11f93bd670a55a875be0720cd002d6a.jpg","height":640,"width":640,"date_captured":"2024-05-21T03:03:48+00:00"},

@tm924222 我这边检查了下,你这边数据集的categories和你config设置的categories完全不一致 config中是:

metainfo=dict(classes=['Missing_hole','Mouse_bite','Open_circuit','Short','Spur','Spurious_copper']),

而你数据集的categories是

[{"id":0,"name":"1","supercategory":"none"},{"id":1,"name":"missing_hole","supercategory":"1"},{"id":2,"name":"mouse_bite","supercategory":"1"},{"id":3,"name":"open_circuit","supercategory":"1"},{"id":4,"name":"short","supercategory":"1"},{"id":5,"name":"spur","supercategory":"1"},{"id":6,"name":"spurious_copper","supercategory":"1"}],

这两者要完全匹配才能正常,因为CocoDataset是根据类别去索引匹配。

请 @Unicorn123455678 检查类似的问题。

wondervictor commented 3 months ago

@tm924222 大小写要匹配,你的第一个类别"1" 也要包含进去,不然出现错位。

tm924222 commented 3 months ago
name":"1","supercat

感谢您的指导,恍然大悟了,我再去修改一下

wondervictor commented 3 months ago

Hi @tm924222, how about the loss now?

tm924222 commented 3 months ago

Hi @tm924222, how about the loss now?

我可能没改好,还是0

tm924222 commented 3 months ago

Hi @tm924222, how about the loss now?

metainfo=dict(classes=['1','Missing_hole','Mouse_bite','Open_circuit','Short','Spur','Spurious_copper']),

Unicorn123455678 commented 3 months ago

@tm924222 大小写要匹配,你的第一个类别"1" 也要包含进去,不然出现错位。

我没有出现这个问题,我是因为我的dataset定义那里没有传入metainfo参数,我传入这个参数后就开始正常收敛了

Unicorn123455678 commented 3 months ago

Hi @tm924222, how about the loss now?嗨,现在的损失怎么样?

但是,我是按照https://github.com/AILab-CVC/YOLO-World/blob/master/docs/finetuning.md里地步骤走的,这里没提到要传这个参数

tm924222 commented 3 months ago

@tm924222 大小写要匹配,你的第一个类别"1" 也要包含进去,不然出现错位。

我没有出现这个问题,我是因为我的dataset定义那里没有传入metainfo参数,我传入这个参数后就开始正常收敛了 请问你用的是自定义数据集吗? 我可以看一下你是怎么传入的吗

Unicorn123455678 commented 3 months ago

@tm924222 大小写要匹配,你的第一个类别"1" 也要包含进去,不然出现错位。

我没有出现这个问题,我是因为我的dataset定义那里没有传入metainfo参数,我传入这个参数后就开始正常收敛了 请问你用的是自定义数据集吗? 我可以看一下你是怎么传入的吗 我们的传入格式是一样的,你要确保你的类别list和你生成coco格式的json的类别完全一致啊

wondervictor commented 3 months ago

Hi @tm924222, how about the loss now?

metainfo=dict(classes=['1','Missing_hole','Mouse_bite','Open_circuit','Short','Spur','Spurious_copper']),

大小写这个可能匹配不上

wondervictor commented 3 months ago

@Unicorn123455678 你这边的loss还是0吗

Unicorn123455678 commented 3 months ago

Hi @tm924222, how about the loss now?嗨,现在的损失怎么样?

我还想问一个问题,我现在转coco_format的json用的类别和我输入的metainfo参数是保持一致的,都是类别对应的中文首字母简写,但是我的class_text_path对每个类别给了更详细的语义描述,模型在正常收敛,loss也正常,class_text_path和coco_format的json不一致会影响开放词检测的能力吗?

Unicorn123455678 commented 3 months ago

@Unicorn123455678 你这边的loss还是0吗

loss正常了,模型在往好的方向收敛,非常感谢您百忙之中的回复

wondervictor commented 3 months ago

@Unicorn123455678

class_text_path和coco_format的json不一致会影响开放词检测的能力吗?

只要是对齐的其实没问题,意思相近即可,这部分是通过index去对齐的,不过建议保持一致会好一些。

loss正常了,模型在往好的方向收敛,非常感谢您百忙之中的回复

好嘞,后续有问题可以直接交流,另外,可以根据下游任务的需求尝试不同的微调方案,可以参考 https://github.com/AILab-CVC/YOLO-World?tab=readme-ov-file#fine-tuning-yolo-world

tm924222 commented 3 months ago

Hi @tm924222, how about the loss now?

metainfo=dict(classes=['1','Missing_hole','Mouse_bite','Open_circuit','Short','Spur','Spurious_copper']),

大小写这个可能匹配不上

好了好了,有损失了,感谢您的指导

wondervictor commented 3 months ago

@tm924222 建议训练一定时间后,验证测试,并可视化检查一下。

tm924222 commented 3 months ago

@tm924222 建议训练一定时间后,验证测试,并可视化检查一下。

在验证的时候会报 data['category_id'] = self.cat_ids[label] IndexError: list index out of range 这个错误

wondervictor commented 3 months ago

确保测试"num_classes=你的类别数",然后检查text json包括类别,有没有设置正确。

tm924222 commented 3 months ago

确保测试"num_classes=你的类别数",然后检查text json包括类别,有没有设置正确。

类别设置的是7类 num_classes = 7 num_training_classes = 7

text json设置的是 [["1"],["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]]

chenjiafu-George commented 3 months ago

确保测试"num_classes=你的类别数",然后检查text json包括类别,有没有设置正确。

类别设置的是7类 num_classes = 7 num_training_classes = 7

text json设置的是 [["1"],["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]]

Hi @tm924222 ,我之前也遇到过这个问题,你需要检查你class的语法问题,类似于[["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]],一个类别需要有一个方框,并且在训练集配置和验证集配置中添加metainfo信息,类似于这个样子: classes = ("bicycle", "boat", "bottle", "bus", "car", "cat")

coco_val_dataset = dict(
    _delete_=True,
    type='MultiModalDataset',
    dataset=dict(type='YOLOv5LVISV1Dataset',
                 metainfo=dict(classes=classes),
                 data_root=r'E:\george\Yolo_world_datasets\ExDark',
                 test_mode=True,
                 ann_file=r'annotations/instances_val2017.json',
                 data_prefix=dict(img=r'images/valid'),
                 batch_shapes_cfg=None),
    class_text_path=r'E:\george\Yolo_world_datasets\ExDark/classes.json',
    pipeline=test_pipeline)

并且你需要把类别数量由7改成6.

tm924222 commented 3 months ago

确保测试"num_classes=你的类别数",然后检查text json包括类别,有没有设置正确。

类别设置的是7类 num_classes = 7 num_training_classes = 7 text json设置的是 [["1"],["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]]

Hi @tm924222 ,我之前也遇到过这个问题,你需要检查你class的语法问题,类似于[["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]],一个类别需要有一个方框,并且在训练集配置和验证集配置中添加metainfo信息,类似于这个样子: classes = ("bicycle", "boat", "bottle", "bus", "car", "cat")

coco_val_dataset = dict(
    _delete_=True,
    type='MultiModalDataset',
    dataset=dict(type='YOLOv5LVISV1Dataset',
                 metainfo=dict(classes=classes),
                 data_root=r'E:\george\Yolo_world_datasets\ExDark',
                 test_mode=True,
                 ann_file=r'annotations/instances_val2017.json',
                 data_prefix=dict(img=r'images/valid'),
                 batch_shapes_cfg=None),
    class_text_path=r'E:\george\Yolo_world_datasets\ExDark/classes.json',
    pipeline=test_pipeline)

并且你需要把类别数量由7改成6.

感谢您的指导,问题解决了,我是在验证的部分没有加 metainfo=dict(classes=classes),光在训练的部分配置了。现在加了就不会报这个错误了。IndexError: list index out of range

chenjiafu-George commented 3 months ago

确保测试"num_classes=你的类别数",然后检查text json包括类别,有没有设置正确。

类别设置的是7类 num_classes = 7 num_training_classes = 7 text json设置的是 [["1"],["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]]

Hi @tm924222 ,我之前也遇到过这个问题,你需要检查你class的语法问题,类似于[["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]],一个类别需要有一个方框,并且在训练集配置和验证集配置中添加metainfo信息,类似于这个样子: classes = ("bicycle", "boat", "bottle", "bus", "car", "cat")

coco_val_dataset = dict(
    _delete_=True,
    type='MultiModalDataset',
    dataset=dict(type='YOLOv5LVISV1Dataset',
                 metainfo=dict(classes=classes),
                 data_root=r'E:\george\Yolo_world_datasets\ExDark',
                 test_mode=True,
                 ann_file=r'annotations/instances_val2017.json',
                 data_prefix=dict(img=r'images/valid'),
                 batch_shapes_cfg=None),
    class_text_path=r'E:\george\Yolo_world_datasets\ExDark/classes.json',
    pipeline=test_pipeline)

并且你需要把类别数量由7改成6.

感谢您的指导,问题解决了,我是在验证的部分没有加 metainfo=dict(classes=classes),光在训练的部分配置了。现在加了就不会报这个错误了。IndexError: list index out of range

Congratulations!You are welcome, if you have a new question, you can continue to open a new question and continue to leave a comment.

LRuiRui517 commented 1 month ago

确保测试"num_classes=你的类别数",然后检查text json包括类别,有没有设置正确。

类别设置的是7类 num_classes = 7 num_training_classes = 7

text json设置的是 [["1"],["missing_hole"], ["mouse_bite"], ["open_circuit"], ["short"], ["spur"],["spurious_copper"]]

请问你现在这个问题解决了吗?

LRuiRui517 commented 1 month ago

@tm924222 建议训练一定时间后,验证测试,并可视化检查一下。

在验证的时候会报 data['category_id'] = self.cat_ids[label] IndexError: list index out of range 这个错误

请问解决了吗?

Nathan-Li123 commented 1 month ago

coco格式的json文件中的类别id必须是从0开始吗?COCO数据集本身似乎也是从1开始的

gggxq commented 3 weeks ago

@tm924222 你好,请问我下载的 '../YOLO-World/third_party/mmyolo/configs/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco.py' 这个文件的代码怎么与你贴出的不一样,你是自己改的吗?我下载的config文件中没有(关于数据集的)metainfo 参数,导致训练自己数据集时,loss_bbox 和 loss_dfl 为 0。希望能得到你的解答