THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.05k stars 414 forks source link

[deepspeed] 0VERFLOW! Rank 0 Skipping step. Attempted os scale: 65536!!! #320

Open PHI6kai opened 8 months ago

PHI6kai commented 8 months ago

0f5075244c0105c39c8015b4a0fd4fb 不管训大的训练集还是小的都会显示overflow然后全为0,是内存问题还是显存问题呢,用的A100-80G @Sleepychord

1049451037 commented 8 months ago

这是deepspeed的scaling机制,可以防止overflow。

bandianxuediao commented 3 months ago

这是deepspeed的scaling机制,可以防止overflow。

也就是说这个是正常的么?