artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
227 stars 16 forks source link

Error in OpenCL on CPU: python tests/validate_network.py --device privateuseone:0 #71

Closed sukamenev closed 3 months ago

sukamenev commented 3 months ago
python tests/validate_network.py --device privateuseone:0
Testing  resnet18
Accessing device #0:AMD EPYC 7542 32-Core Processor                 on Intel(R) CPU Runtime for OpenCL(TM) Applications
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/validate_network.py", line 280, in <module>
    main(r)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/validate_network.py", line 221, in main
    train_on_images(m,batch,args.device,args.eval,iter_size = args.iter_size,opt_steps = args.opt,fwd=args.fwd)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/validate_network.py", line 105, in train_on_images
    ref = step(model,data,labels,opt_steps,iter_size,fwd=fwd,test=test)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim/tests/validate_network.py", line 85, in step
    loss.backward()
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator

May be reason is AMD CPU on Intel OpenCL Runtime?

sukamenev commented 3 months ago

Same error on AMD OpenCl Runtime

python tests/validate_network.py --device privateuseone:2
Testing  resnet18
Accessing device #2:AMD EPYC 7542 32-Core Processor on AMD Accelerated Parallel Processing
Traceback (most recent call last):
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/validate_network.py", line 280, in <module>
    main(r)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/validate_network.py", line 221, in main
    train_on_images(m,batch,args.device,args.eval,iter_size = args.iter_size,opt_steps = args.opt,fwd=args.fwd)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/validate_network.py", line 105, in train_on_images
    ref = step(model,data,labels,opt_steps,iter_size,fwd=fwd,test=test)
  File "/home/inetstar/Kamenev/programming/ZenDnn/pytorch_dlprim_orig/tests/validate_network.py", line 85, in step
    loss.backward()
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/inetstar/Kamenev/programming/ZenDnn/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: could not create a primitive descriptor iterator
artyom-beilis commented 3 months ago

CPU isn't really supported or tested.