AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.39k stars 427 forks source link

如何冻结预训练权重,进行部分微调? #294

Open zhongzee opened 4 months ago

zhongzee commented 4 months ago

yolo_world_v2_l_clip_large_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_800ft_lvis_minival.py

部分配置如下:如果想要加载该预训练权重,并且执行dist_train.sh 1)我应该怎么冻结下面load_from的的所有权重,并修改backbone部分结构,比如添加适配层来微调该权重呢? 2)load_from的权重包括yolov8backbone/neck层/head层的权重是吗? load_from = '/mnt/afs/huangtao3/wzz/YOLO-World/weights/YOLO-World/yolo_world_v2_l_clip_large_o365v1_goldg_pretrain_800ft-9df82e55.pth' text_model_name = '/mnt/afs/huangtao3/wzz/YOLO-World/weights/clip-vit-large-patch14-336' img_scale = (800, 800)

model settings

model = dict( type='YOLOWorldDetector', mm_neck=True, num_train_classes=num_training_classes, num_test_classes=num_classes, data_preprocessor=dict(type='YOLOWDetDataPreprocessor'), backbone=dict( delete=True, type='MultiModalYOLOBackbone', image_model={{base.model.backbone}}, text_model=dict( type='HuggingCLIPLanguageBackbone', model_name=text_model_name, frozen_modules=['all'])), neck=dict(type='YOLOWorldPAFPN', guide_channels=text_channels, embed_channels=neck_embed_channels, num_heads=neck_num_heads, block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv')), bbox_head=dict(type='YOLOWorldHead', head_module=dict(type='YOLOWorldHeadModule', use_bn_head=True, embed_dims=text_channels, num_classes=num_training_classes)), train_cfg=dict(assigner=dict(num_classes=num_training_classes)))

tomgotjack commented 4 months ago

@Wuzhongze 你好,这部分可以参考mmyolo官方给出的文档 https://github.com/open-mmlab/mmyolo/blob/main/docs/zh_cn/common_usage/freeze_layers.md 我做了一个简单的冻结部分权重的训练,代码如下: model = dict( type='YOLOWorldDetector', mm_neck=True, num_train_classes=num_training_classes, num_test_classes=num_classes, data_preprocessor=dict(type='YOLOWDetDataPreprocessor'), backbone=dict( _delete_=True, type='MultiModalYOLOBackbone', frozen_stages=4, image_model={{_base_.model.backbone}}, text_model=dict( type='HuggingCLIPLanguageBackbone', model_name=text_model_name, frozen_modules=['all'])), neck=dict(type='YOLOWorldPAFPN', guide_channels=text_channels, embed_channels=neck_embed_channels, num_heads=neck_num_heads, freeze_all=True, block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv')), bbox_head=dict(type='YOLOWorldHead', head_module=dict(type='YOLOWorldHeadModule', use_bn_head=True, embed_dims=text_channels, num_classes=num_training_classes)), train_cfg=dict(assigner=dict(num_classes=num_training_classes))) 就是模仿文档在backbone加了 frozen_stages=4,和在neck加了freeze_all=True

zhongzee commented 4 months ago

@Wuzhongze 你好,这部分可以参考mmyolo官方给出的文档 https://github.com/open-mmlab/mmyolo/blob/main/docs/zh_cn/common_usage/freeze_layers.md 我做了一个简单的冻结部分权重的训练,代码如下: model = dict( type='YOLOWorldDetector', mm_neck=True, num_train_classes=num_training_classes, num_test_classes=num_classes, data_preprocessor=dict(type='YOLOWDetDataPreprocessor'), backbone=dict( _delete_=True, type='MultiModalYOLOBackbone', frozen_stages=4, image_model={{_base_.model.backbone}}, text_model=dict( type='HuggingCLIPLanguageBackbone', model_name=text_model_name, frozen_modules=['all'])), neck=dict(type='YOLOWorldPAFPN', guide_channels=text_channels, embed_channels=neck_embed_channels, num_heads=neck_num_heads, freeze_all=True, block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv')), bbox_head=dict(type='YOLOWorldHead', head_module=dict(type='YOLOWorldHeadModule', use_bn_head=True, embed_dims=text_channels, num_classes=num_training_classes)), train_cfg=dict(assigner=dict(num_classes=num_training_classes))) 就是模仿文档在backbone加了 frozen_stages=4,和在neck加了freeze_all=True

非常感谢!!!