Open wenxian-ok opened 3 weeks ago
执行训练命令:CUDA_VISIBLE_DEVICES=0,1,2 CUDA_LAUNCH_BLOCKING=1 NCCL_P2P_LEVEL=NVL nohup python -m torch.distributed.launch \ --nproc_per_node 3 --master_port 22222 \ main_train.py --config_path ./config/base.yaml \
test.log 2>&1 &
发现无法启动模型,这是什么问题呢?
执行训练命令:CUDA_VISIBLE_DEVICES=0,1,2 CUDA_LAUNCH_BLOCKING=1 NCCL_P2P_LEVEL=NVL nohup python -m torch.distributed.launch \ --nproc_per_node 3 --master_port 22222 \ main_train.py --config_path ./config/base.yaml \
发现无法启动模型,这是什么问题呢?