artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
227 stars 16 forks source link

Strange error in test: Diff too big #72

Open sukamenev opened 3 months ago

sukamenev commented 3 months ago

On OpenCL CPU. After update OpenCL runtime I see another error like error in other test script:

Mean 1d
Accessing device #0:AMD EPYC 7542 32-Core Processor                 on Intel(R) CPU Runtime for OpenCL(TM) Applications
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Mean 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
         y 0.000000
        x0 0.000000
Mean 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Mean 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
Mean all squeeze
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
Sum 1d
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Sum 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
         y 0.000000
        x0 0.000000
Sum 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
LogSoftmax
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
LogSoftmax
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
         y 0.000000
        x0 0.000000
Softmax
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
NLLLoss
torch.Size([])
torch.Size([])
tensor(0.0413, grad_fn=<NllLossBackward0>)
tensor(0.0418)
         y 0.000469
        x0 0.000000
AAPool2d
torch.Size([4, 8, 1, 1])
torch.Size([4, 8, 1, 1])
         y 0.000000
        x0 0.000000
Abs
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Abs_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Sigmoid
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
Sigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
Hardsigmoid
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardsigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLu
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Tanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Tanh_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
SiLU
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
SiLU_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
GELU
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
GELU tanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
BCE Loss
torch.Size([])
torch.Size([])
        x0 0.000001
        x1 0.000000
         y 0.000000
BCE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
        x0 0.000001
         y 0.000000
        x1 0.000000
MSE Loss
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
        x1 0.000000
MSE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
         y 0.000000
        x0 0.000000
        x1 0.000000
Min
Ok
Max
Ok
Dot
Ok
Clamp 1
Ok
Clamp 2
Ok
Clamp 3
Ok
Linear 2d
    p_bias 0.000000
         y 0.000000
        x0 0.000000
  p_weight 0.000000
Linear 3d
    p_bias 0.000000
         y 0.000000
        x0 0.000000
  p_weight 0.000000
Conv
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 282, in <module>
    test_all(r.device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 254, in test_all
    test_fwd_bwd_op([([2,6,10,20],-1)],torch.nn.Conv2d(6,8,[3,5],stride=[1,2],padding=[1,2],dilation=1,groups=2),device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 74, in test_fwd_bwd_op
    y_cpu.backward(dy_cpu,retain_graph=True)
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator
sukamenev commented 3 months ago

On AMD OpenCL (AMDAPPSDK-3.0) another error:

python tests/test_op.py --device privateuseone:2
Mean 1d
Accessing device #2:AMD EPYC 7542 32-Core Processor on AMD Accelerated Parallel Processing
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
tensor([[[-0.2863, -0.1444,  1.4827, -0.2142],
         [ 0.9526, -1.2787,  0.7404, -0.3989],
         [ 0.8163,  0.2142,  0.2852,  0.8597]]], grad_fn=<MeanBackward1>)
tensor([[[1.4019, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]]])
         y 1.688240
        x0 0.000000
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 282, in <module>
    test_all(r.device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 158, in test_all
    test_fwd_bwd([([2,3,4],-1)],lambda x:torch.mean(x,dim=0,keepdim=True),device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/test_op.py", line 153, in test_fwd_bwd
    raise Exception("Diff too big")
Exception: Diff too big

max_diff = 1.9810690879821777

sukamenev commented 3 months ago

On AMD OpenCL (from amdgpu-pro) also error in the end of test:

Mean 1d
Accessing device #3:Fiji on AMD Accelerated Parallel Processing
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Mean 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
        x0 0.000000
         y 0.000000
Mean 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Mean 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
Mean all squeeze
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
Sum 1d
torch.Size([1, 3, 4])
torch.Size([1, 3, 4])
         y 0.000000
        x0 0.000000
Sum 2d
torch.Size([2, 1, 1])
torch.Size([2, 1, 1])
         y 0.000000
        x0 0.000000
Sum 1d squeeze
torch.Size([3, 4])
torch.Size([3, 4])
         y 0.000000
        x0 0.000000
Sum 2d squeeze
torch.Size([3])
torch.Size([3])
         y 0.000000
        x0 0.000000
LogSoftmax
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LogSoftmax
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
        x0 0.000000
         y 0.000000
Softmax
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
NLLLoss
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
AAPool2d
torch.Size([4, 8, 1, 1])
torch.Size([4, 8, 1, 1])
         y 0.000000
        x0 0.000000
Abs
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Abs_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardtanh_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Sigmoid
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Sigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardsigmoid
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Hardsigmoid_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
ReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLu
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
LReLU_
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
Tanh
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
Tanh_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
SiLU
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
SiLU_
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
GELU
torch.Size([4, 3])
torch.Size([4, 3])
         y 0.000000
        x0 0.000000
GELU tanh
torch.Size([4, 3])
torch.Size([4, 3])
        x0 0.000000
         y 0.000000
BCE Loss
torch.Size([])
torch.Size([])
        x0 0.000058
         y 0.000000
        x1 0.000000
BCE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
        x0 0.000008
         y 0.000000
        x1 0.000000
MSE Loss
torch.Size([])
torch.Size([])
         y 0.000000
        x0 0.000000
        x1 0.000000
MSE Loss no reduction
torch.Size([4, 3, 5])
torch.Size([4, 3, 5])
         y 0.000000
        x0 0.000000
        x1 0.000000
Min
Ok
Max
Ok
Dot
Ok
Clamp 1
Ok
Clamp 2
Ok
Clamp 3
Ok
Linear 2d
  p_weight 0.000000
    p_bias 0.000000
         y 0.000000
        x0 0.000000
Linear 3d
  p_weight 0.000002
    p_bias 0.000000
         y 0.000000
        x0 0.000000
Conv
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 282, in <module>
    test_all(r.device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 254, in test_all
    test_fwd_bwd_op([([2,6,10,20],-1)],torch.nn.Conv2d(6,8,[3,5],stride=[1,2],padding=[1,2],dilation=1,groups=2),device)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/test_op.py", line 74, in test_fwd_bwd_op
    y_cpu.backward(dy_cpu,retain_graph=True)
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator