AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.67k stars 453 forks source link

如何分布式训练? #298

Open zhongzee opened 6 months ago

zhongzee commented 6 months ago

chmod +x tools/dist_train.sh

sample command for pre-training, use AMP for mixed-precision training

./tools/dist_train.sh configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp 这个命令只适合单服务器的,如和改成多服务器的形式呢?nodes和node_rank要怎么设置呢?

wondervictor commented 6 months ago

nnodes 设置为机器的数量, node_rank 设置为每台机机器的rank,详情请参考:https://pytorch.org/tutorials/intermediate/ddp_series_multinode.html