Outsider565 / LoRA-GA

155 stars 6 forks source link

目前是不支持用deepspeed zero3训练吗 #3

Open aoboxia opened 3 months ago

aoboxia commented 3 months ago

开启deepspeed zero3在estimate_gradient里面梯度估计的时候会报错。

[rank1]: Traceback (most recent call last):
[rank1]:   File "finetune_lora_ga.py", line 747, in <module>
[rank1]:     train()
[rank1]:   File "finetune_lora_ga.py", line 649, in train
[rank1]:     named_grads = estimate_gradient(model, temp_set, 4)
[rank1]:   File "finetune_lora_ga.py", line 272, in estimate_gradient
[rank1]:     outputs = model(**batch)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward
[rank1]:     outputs = self.model(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 878, in forward
[rank1]:     inputs_embeds = self.embed_tokens(input_ids)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py", line 163, in forward
[rank1]:     return F.embedding(
[rank1]:   File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2264, in embedding
[rank1]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank1]: RuntimeError: 'weight' must be 2-D
Outsider565 commented 3 months ago

是的,理论上支持deepspeed,但实际情况deepspeed配置文件太多,可能不太好适配,所以之前就没做。