Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
MIT License
653 stars 78 forks source link

多gpu从0训练出现CUDA error: device-side assert triggered #30

Closed zhouwei5113 closed 2 years ago

zhouwei5113 commented 2 years ago

错误信息: File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2846, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: CUDA error: device-side assert triggered

训练脚本: python train.py --folder data_dir --model_name ViT-B/32 --batch_size 1024 --gpus 4 --strategy ddp --num_workers 16

如何解决?(单gpu训练没有问题)