deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself
https://coder.deepseek.com/
MIT License
5.99k stars 431 forks source link

为什么在进行一次训练加载后,会出现找不到显卡no slot的报错呢? #167

Open ZhiyuYUE opened 2 weeks ago

ZhiyuYUE commented 2 weeks ago

并且第一次训练时,0卡会无法加载完训练集导致卡在94%,终止之后再进行训练就会出现以下报错: raise ValueError(f"No slot '{slot}' specified on host '{hostname}'") ValueError: No slot '4' specified on host 'localhost'