THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
39.96k stars 5.15k forks source link

请问在运行p-tung/train.sh微调时是否可以冻结prefix_encoder层的全部参数,微调模型其他Block的参数? #1444

Open huilong-chen opened 6 months ago

huilong-chen commented 6 months ago

Is there an existing issue for this?

Current Behavior

现在我对p-tuning/modeling_chatglm.py中第850行代码起添加了如下代码: ` if self.pre_seq_len is not None: for param in self.parameters(): param.requires_grad = False self.prefix_tokens = torch.arange(self.pre_seq_len).long() self.prefix_encoder = PrefixEncoder(config) self.dropout = torch.nn.Dropout(0.1)

        for k, v in self.prefix_encoder.named_parameters():
            v.requires_grad = False
        for k, v in self.layers[0].named_parameters():
            v.requires_grad = True

` 在继续微调时会导致梯度爆炸,loss出现nan。(LR修改成了全量微调时的1e-4)

Expected Behavior

No response

Steps To Reproduce

在p-tuning/modeling_chatglm.py中第850行代码起添加如下代码: for k, v in self.prefix_encoder.named_parameters(): v.requires_grad = False for k, v in self.layers[0].named_parameters(): v.requires_grad = True

Environment

- OS: 
- Python:
- Transformers: 
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response