artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
227 stars 16 forks source link

Runtime Error #26

Closed GreatestTrain closed 1 year ago

GreatestTrain commented 1 year ago

Hello.

I am trying to test this backend outside of the test provided in this repository.

But get an error trying to create a tensor inside "privateuseone:1" device

import torch
torch.ops.load_library("/home/rml/Desktop/pytorch_dlprim/build/libpt_ocl.so")
torch.tensor((5.,1.,2.), dtype=torch.float32).to("privateuseone:1")

[Out]: RuntimeError: Invalid Device #1

python mnist.py --device ocl:1 runs just fine.

Checking the build folder I noticed that libpt_ocl.so is linked to an shared object located on build folder:

[~/Desktop/pytorch_dlprim/build]$ ldd libpt_ocl.so 
        linux-vdso.so.1 (0x00007ffc2c794000)
        libtorch.so => /home/rml/pytorch/lib/python3.10/site-packages/torch/lib/libtorch.so (0x00007f7b4e222000)
        libc10.so => /home/rml/pytorch/lib/python3.10/site-packages/torch/lib/libc10.so (0x00007f7b4e16a000)
        libOpenCL.so.1 => /usr/lib/libOpenCL.so.1 (0x00007f7b4e0f0000)
        libdlprim_core.so => /home/rml/Desktop/pytorch_dlprim/build/dlprimitives/libdlprim_core.so (0x00007f7b4e036000)
        libtorch_cpu.so => /home/rml/pytorch/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so (0x00007f7b33a00000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f7b33600000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f7b4e014000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f7b33419000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007f7b4e2d0000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f7b33918000)
        libgomp-a34b3233.so.1 => /home/rml/pytorch/lib/python3.10/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00007f7b33000000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f7b4e00f000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f7b4e008000)
        libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0x00007f7b332cc000)
        librt.so.1 => /usr/lib/librt.so.1 (0x00007f7b4e003000)

(builded with -DCMAKE_PREFIX_PATH=~/.conda/envs/pytorch/lib/python3.10/site-packages/torch/share/cmake/Torch)

make install DESTDIR=~/.conda/envs/pytorch/ just copies .so and .hpp files from dlprim but no _libptocl file.

Adding the next lines in mnist.py after line 148 prints the expected tensor:

148: = Net().to_device()
149: tensor = torch.tensor((5,1,2), dtype=torch.float32).to("privateuseone:1")
150: print()
151: exit()

clinfo --list:

Platform #0: Clover
 `-- Device #0: AMD Radeon RX 5600 XT (navi10, LLVM 15.0.7, DRM 3.49, 6.1.7-zen1-1-zen)
Platform #1: AMD Accelerated Parallel Processing
 `-- Device #0: gfx1010:xnack-
Platform #2: rusticl
artyom-beilis commented 1 year ago

Can you clarify somethings:

Is this working?

import torch
torch.ops.load_library("/home/rml/Desktop/pytorch_dlprim/build/libpt_ocl.so")
t = torch.tensor((5.,1.,2.), dtype=torch.float32).to("privateuseone:1")
print(t)

Cause it worked for me. What is the workaround you suggested?

Regarding installation. I indeed hadn't added proper installation procedure, I'd rather want to do something like import torch_opencl but hadn't got to it yet.

GreatestTrain commented 1 year ago

Nevermind, seems like a jupyter-specific bug.

Works as normal in python and ipython shell.

(pytorch) notebooks[master*] % python test.py     
Accessing device #1:gfx1010:xnack- on AMD Accelerated Parallel Processing
tensor([5.0000e+00, 1.0000e+00, 2.0000e+00], device='privateuseone:1')

(pytorch) notebooks[master*] % ipython test.py 
Accessing device #1:gfx1010:xnack- on AMD Accelerated Parallel Processing
tensor([5.0000e+00, 1.0000e+00, 2.0000e+00], device='privateuseone:1')

Is just that jupyter seems to not be working. imagen

Anyway, thanks for your time.