Open AlphaNext opened 2 months ago
Hi, guys, we may have the same problem. Add a wechat?
Hi, guys, we may have the same problem. Add a wechat?
maybe the learning rate is larger, the default lr is 0.001
# configs/sft.yaml
# Between 1E-3 and 5E-4 For Lora and 1E-5 For SFT
Hi, guys, we may have the same problem. Add a wechat?
maybe the learning rate is larger, the default lr is 0.001
# configs/sft.yaml # Between 1E-3 and 5E-4 For Lora and 1E-5 For SFT
@AlphaNext where do you see default is 0.001? In sft.yaml it's lr: 0.00001
Hi, guys, we may have the same problem. Add a wechat?
maybe the learning rate is larger, the default lr is 0.001
# configs/sft.yaml # Between 1E-3 and 5E-4 For Lora and 1E-5 For SFT
@AlphaNext where do you see default is 0.001? In sft.yaml it's lr: 0.00001
System Info / 系統信息
cuda11.8/torch2.4
Information / 问题信息
Reproduction / 复现过程
only change the dataset path, the NAN log:
Expected behavior / 期待表现
solve it