Open qjq-111 opened 7 months ago
Hi @qjq-111, training is stable and we have run several times of pre-training YOLO-World-L. However, we adopt 4x8 GPUS and 16 samples per GPU in a batch. Scaling the batch size might lead to performance variation (higher or lower) and we attribute the variation to the weight decay, which is adjusted according to the batch size and number of GPUs.
Could you provide more details about the training?
Could you provide more details about the training?
Thank you for your reply.
When I set batch_train_batch_size_per_gpu to 10, how should I adjust the weight decay accordingly?
My system configuration:
The last epoch:
Maybe, you need to check the scaled weight decay in your experiments, the model training is sensitive to the value of weight decay. The weight decay is automatically adjusted by the total batch size, and this is not suitable for the AdamW optimizer. Empirically, the scaled_weight_decay=0.2
is suitable for YOLO-World: INFO - Scaled weight_decay to 0.2
.
The weight decay scaling can be found here:
@wondervictor Thanks~ I'll try it again
Hi @qjq-111, I've provided the training log and you can compare the evaluation results after a few epochs instead of training 100 epochs.
复现的YOLO-Worldv2-L pretrain模型评测指标略低于作者提供的模型
对应model zoo的这一行:
使用的配置文件yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py,启用混合精度训练,仅把train_batch_size_per_gpu由16改为了10(由于显存限制)
复现指标评测结果:
hugging face提供模型的评测结果:
请教一下,这个是正常现象吗,是因为batch_size的改动造成,还是说训练有随机性