现在我对p-tuning/modeling_chatglm.py中第850行代码起添加了如下代码:
`
if self.pre_seq_len is not None:
for param in self.parameters():
param.requires_grad = False
self.prefix_tokens = torch.arange(self.pre_seq_len).long()
self.prefix_encoder = PrefixEncoder(config)
self.dropout = torch.nn.Dropout(0.1)
for k, v in self.prefix_encoder.named_parameters():
v.requires_grad = False
for k, v in self.layers[0].named_parameters():
v.requires_grad = True
`
在继续微调时会导致梯度爆炸,loss出现nan。(LR修改成了全量微调时的1e-4)
Expected Behavior
No response
Steps To Reproduce
在p-tuning/modeling_chatglm.py中第850行代码起添加如下代码:
for k, v in self.prefix_encoder.named_parameters(): v.requires_grad = False for k, v in self.layers[0].named_parameters(): v.requires_grad = True
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Is there an existing issue for this?
Current Behavior
现在我对p-tuning/modeling_chatglm.py中第850行代码起添加了如下代码: ` if self.pre_seq_len is not None: for param in self.parameters(): param.requires_grad = False self.prefix_tokens = torch.arange(self.pre_seq_len).long() self.prefix_encoder = PrefixEncoder(config) self.dropout = torch.nn.Dropout(0.1)
` 在继续微调时会导致梯度爆炸,loss出现nan。(LR修改成了全量微调时的1e-4)
Expected Behavior
No response
Steps To Reproduce
在p-tuning/modeling_chatglm.py中第850行代码起添加如下代码:
for k, v in self.prefix_encoder.named_parameters(): v.requires_grad = False for k, v in self.layers[0].named_parameters(): v.requires_grad = True
Environment
Anything else?
No response