ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
15.71k
stars
1.85k
forks
source link
[BUG/Help] <使用torch_dtype=torch.float32加载的chatglm2-6b,用peft模块的lora微调时,前20左右的steploss是10-20,大概20+step loss就开始为nan了,check过输入和标签都没啥问题,有朋友知道怎么解决吗? #519
Open
Doufanfan opened 1 year ago
Is there an existing issue for this?
Current Behavior
模型加载代码
model = AutoModel.from_pretrained(pre_model_path, trust_remote_code=True, torch_dtype=torch.float32).cuda() model.supports_gradient_checkpointing = True model.gradient_checkpointing_enable() model.enable_input_require_grads() model.config.use_cache = False
lora设置代码
peft_config = LoraConfig( task_type=task_type, inference_mode=False, r=r, lora_alpha=lora_alpha, lora_dropout=lora_dropout )
打印日志
Expected Behavior
希望有朋友知道咋让loss变正常?
Steps To Reproduce
见「Current Behavior」
Environment
Anything else?
暂无