Open JohnTang93 opened 9 months ago
单机多卡训练正常,多机多卡报错
Skipping backward and optimizer step for nan or inf in forwarding metrics/loss!
尝试把--fp16换成--bf16
单机多卡训练正常,多机多卡报错
Skipping backward and optimizer step for nan or inf in forwarding metrics/loss!