artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
227 stars 16 forks source link

Undefined symbols #76

Open skn123 opened 2 months ago

skn123 commented 2 months ago

Built the library ... Pytorch 1.13.1

/mnt/d/srcs/pytorch_dlprim$ python3 mnist.py --device ocl:0 Traceback (most recent call last): File "/mnt/d/srcs/pytorch_dlprim/mnist.py", line 164, in main() File "/mnt/d/srcs/pytorch_dlprim/mnist.py", line 121, in main torch.ops.load_library("/mnt/d/build_ninja/dlprim/libpt_ocl.so") File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 573, in load_library ctypes.CDLL(path) File "/usr/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /mnt/d/build_ninja/dlprim/libpt_ocl.so: undefined symbol: _ZNK3c105Error4whatEv

...............

ldd /mnt/d/build_ninja/dlprim/libpt_ocl.so linux-vdso.so.1 (0x00007ffe880fb000) libc10.so => /usr/local/lib/libc10.so (0x00007f5a3c439000) libOpenCL.so.1 => /lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f5a3c406000) libdlprim_core.so => /mnt/d/build_ninja/dlprim/dlprimitives/libdlprim_core.so (0x00007f5a3c348000) libtorch.so => /usr/local/lib/libtorch.so (0x00007f5a3c343000) libtorch_cpu.so => /usr/local/lib/libtorch_cpu.so (0x00007f5a330f4000) libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f5a32fa5000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5a32d79000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5a32c92000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5a32c72000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5a32a49000) /lib64/ld-linux-x86-64.so.2 (0x00007f5a3c5db000) libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f5a32a3c000) libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007f5a305e8000) libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007f5a305ce000) libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f5a30497000) libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f5a301bc000) libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f5a30109000) libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f5a3004c000) libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f5a2fff0000) libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f5a2ffa8000) libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007f5a2ff73000) libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f5a2ff6e000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5a2ff52000) libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f5a2ff28000)

skn123 commented 2 months ago

If I install the default modules from pip, then I get this error: /mnt/d/srcs/pytorch_dlprim$ python3 mnist.py --device ocl:0A Traceback (most recent call last):A File "/mnt/d/srcs/pytorch_dlprim/mnist.py", line 164, in main() File "/mnt/d/srcs/pytorch_dlprim/mnist.py", line 121, in main torch.ops.load_library("/mnt/d/build_ninja/dlprim/libpt_ocl.so") File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 573, in load_library ctypes.CDLL(path) File "/usr/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /mnt/d/build_ninja/dlprim/libpt_ocl.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

Update: Uninstalling all previous versions and installing the whl from pytorch helped. But still does not answer why it fails with custom build?

artyom-beilis commented 2 months ago

From your other comment I understand you managed to run it?

skn123 commented 2 months ago

Yes and in another comment I showed that pushing it to Intel GPU is fine (except that it is slow). I am still wondering what would be the "correct CMake" parameters to create a custom build of torch.