THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.71k stars 1.85k forks source link

[BUG/Help] <使用torch_dtype=torch.float32加载的chatglm2-6b,用peft模块的lora微调时,前20左右的steploss是10-20,大概20+step loss就开始为nan了,check过输入和标签都没啥问题,有朋友知道怎么解决吗? #519

Open Doufanfan opened 1 year ago

Doufanfan commented 1 year ago

Is there an existing issue for this?

Current Behavior

模型加载代码

model = AutoModel.from_pretrained(pre_model_path, trust_remote_code=True, torch_dtype=torch.float32).cuda() model.supports_gradient_checkpointing = True model.gradient_checkpointing_enable() model.enable_input_require_grads() model.config.use_cache = False

lora设置代码

peft_config = LoraConfig( task_type=task_type, inference_mode=False, r=r, lora_alpha=lora_alpha, lora_dropout=lora_dropout )

打印日志

图片

Expected Behavior

希望有朋友知道咋让loss变正常?

Steps To Reproduce

见「Current Behavior」

Environment

- OS:Ubuntu 20.04
- Python:3.11
- Transformers:4.13.0
- PyTorch:2.0.1
- peft:0.3.0.dev0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

暂无

Doufanfan commented 1 year ago

已经解决,lr设置太大导致训练loss💥了