Open aoboxia opened 3 months ago
开启deepspeed zero3在estimate_gradient里面梯度估计的时候会报错。
[rank1]: Traceback (most recent call last): [rank1]: File "finetune_lora_ga.py", line 747, in <module> [rank1]: train() [rank1]: File "finetune_lora_ga.py", line 649, in train [rank1]: named_grads = estimate_gradient(model, temp_set, 4) [rank1]: File "finetune_lora_ga.py", line 272, in estimate_gradient [rank1]: outputs = model(**batch) [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward [rank1]: outputs = self.model( [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 878, in forward [rank1]: inputs_embeds = self.embed_tokens(input_ids) [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py", line 163, in forward [rank1]: return F.embedding( [rank1]: File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2264, in embedding [rank1]: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) [rank1]: RuntimeError: 'weight' must be 2-D
是的,理论上支持deepspeed,但实际情况deepspeed配置文件太多,可能不太好适配,所以之前就没做。
开启deepspeed zero3在estimate_gradient里面梯度估计的时候会报错。