deepseek-ai / DeepSeek-MoE

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
MIT License
963 stars 45 forks source link

Finetune with deepspeed: type mismatch #35

Open YeZiyi1998 opened 3 months ago

YeZiyi1998 commented 3 months ago

I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:

File "/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py", line 57, in forward
output = input.matmul(weight.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.

I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.

lihaoling commented 2 months ago

same question

JensenDong commented 2 months ago

same + 1

yiyepiaoling0715 commented 1 month ago

I encountered the same problem, and here's how I solved it. modify lines 425 and 428 in the modelling_deepseek.py file and remove torch.float32, such as the following code

        logits = F.linear(
            hidden_states, self.weight, None
        )
      if self.scoring_func == "softmax":
            scores = logits.softmax(dim=-1)

I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:

File "/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py", line 57, in forward
output = input.matmul(weight.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.

I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.