AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.64k stars 449 forks source link

pretrain模型复现 #190

Open qjq-111 opened 7 months ago

qjq-111 commented 7 months ago

复现的YOLO-Worldv2-L pretrain模型评测指标略低于作者提供的模型

对应model zoo的这一行: image

使用的配置文件yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py,启用混合精度训练,仅把train_batch_size_per_gpu由16改为了10(由于显存限制)

复现指标评测结果: Z799Dx3lGW

hugging face提供模型的评测结果: MaRH8zCMCH

请教一下,这个是正常现象吗,是因为batch_size的改动造成,还是说训练有随机性

wondervictor commented 7 months ago

Hi @qjq-111, training is stable and we have run several times of pre-training YOLO-World-L. However, we adopt 4x8 GPUS and 16 samples per GPU in a batch. Scaling the batch size might lead to performance variation (higher or lower) and we attribute the variation to the weight decay, which is adjusted according to the batch size and number of GPUs.

wondervictor commented 7 months ago

Could you provide more details about the training?

qjq-111 commented 7 months ago

Could you provide more details about the training?

Thank you for your reply.

When I set batch_train_batch_size_per_gpu to 10, how should I adjust the weight decay accordingly?

My system configuration: image

The last epoch: image

wondervictor commented 7 months ago

Maybe, you need to check the scaled weight decay in your experiments, the model training is sensitive to the value of weight decay. The weight decay is automatically adjusted by the total batch size, and this is not suitable for the AdamW optimizer. Empirically, the scaled_weight_decay=0.2 is suitable for YOLO-World: INFO - Scaled weight_decay to 0.2.

The weight decay scaling can be found here:

https://github.com/AILab-CVC/YOLO-World/blob/24c7121cf83c5808efef91c91b76277980feb99e/yolo_world/engine/optimizers/yolow_v5_optim_constructor.py#L166

qjq-111 commented 7 months ago

@wondervictor Thanks~ I'll try it again

wondervictor commented 7 months ago

Hi @qjq-111, I've provided the training log and you can compare the evaluation results after a few epochs instead of training 100 epochs.