Closed zhouwei5113 closed 2 years ago
错误信息: File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2846, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: CUDA error: device-side assert triggered
训练脚本: python train.py --folder data_dir --model_name ViT-B/32 --batch_size 1024 --gpus 4 --strategy ddp --num_workers 16
python train.py --folder data_dir --model_name ViT-B/32 --batch_size 1024 --gpus 4 --strategy ddp --num_workers 16
如何解决?(单gpu训练没有问题)
错误信息: File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2846, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: CUDA error: device-side assert triggered
训练脚本:
python train.py --folder data_dir --model_name ViT-B/32 --batch_size 1024 --gpus 4 --strategy ddp --num_workers 16
如何解决?(单gpu训练没有问题)