PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.19k stars 5.57k forks source link

bfloat16 dtype 下 paddle.nn.functional.linear和 torch.nn.functional.linear 存在diff #58561

Open wtmlon opened 12 months ago

wtmlon commented 12 months ago

bug描述 Describe the Bug

Description

x_np = np.load('hs_0.npy') x = paddle.to_tensor(x_np, dtype='bfloat16') w_np = np.load('w.npy') w = paddle.to_tensor(w_np, dtype='bfloat16')

out = F.linear(x, w.T, bias=None) out1 = paddle.matmul(x, w, transpose_y=True) print(out) print(paddle.all((out-out1)==0)) np.save('sdpa.npy', out.astype('float32')) np.save('sdpa1.npy', out1.astype('float32'))

- 第二步,在 torch 虚拟环境下运行t_bf16_matmul.py,读取给定的 hs_0.npy(fp32)作为 x,w.npy (fp32)作为 W,进行(bf16) matmul 运算和 F.linear,得出结果存为t_sdpa.npy(fp32)/t_sdpa.npy(fp32), 该脚本代码如下
```python
import torch 
import torch.nn.functional as F
import numpy as np

x_np = np.load('hs_0.npy')
x = torch.tensor(x_np, dtype=torch.bfloat16).to("cuda:0")
w_np = np.load('w.npy')
w = torch.tensor(w_np, dtype=torch.bfloat16).to("cuda:0")

out1 = torch.matmul(x, w.transpose(0,1))
out = F.linear(x, w, bias=None)
print(out)
print(torch.all((out1-out) == 0))
np.save('t_sdpa.npy', out.cpu().detach().to(torch.float32))
np.save('t_sdpa1.npy', out1.cpu().detach().to(torch.float32))

最后运行 check.a.py脚本对两端结果做差

> python check.a.py
linear_diff:  0.00390625
matmul_diff:  0.0
paddle_linear_matmul_diff:  0.00390625
torch_linear_matmul_diff:  0.0

可以看到在 gpu 下,两端 matmul 可以对齐,但是两端的 F.linear 是有 diff 的

其他补充信息 Additional Supplementary Information

No response

DesmonDay commented 11 months ago

把torch的计算设备移到gpu上,可以对齐。