artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
264 stars 17 forks source link

Cannot load libdlprim_core.so because of undefined symbols #8

Closed kakulo closed 1 year ago

kakulo commented 2 years ago

Hi Beilis,

I have done the installation following your instructions. I am seeing an error while loading libdlprim_core.so with torch.ops.load_library. I didn't have issues with loading libdlprim.so and libdlprim_core.so. Could you please give me some hints? Thank you!

>>> torch.ops.load_library("/home/guol678/newlibs/dlprim/lib/libdlprim.so") >>> torch.ops.load_library("/home/guol678/newlibs/dlprim/lib/libdlprim_core.so")

>>> torch.ops.load_library("/home/guol678/torchOCL/pytorch_dlprim/build/libpt_ocl.so")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/guol678/newlibs/miniconda/envs/dlprim/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library
    ctypes.CDLL(path)
  File "/home/guol678/newlibs/miniconda/envs/dlprim/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/guol678/torchOCL/pytorch_dlprim/build/libpt_ocl.so: undefined symbol: _ZN6dlprim4core18activation_forwardERNS_6TensorES2_NS_19StandardActivationsERKNS_16ExecutionContextE

-Lenny

artyom-beilis commented 2 years ago
  1. You don't need to load libdlprim.so or libdlprim_core.so. the libpt_ocl.so is linked with them. Just make sure LD_LIBRARY_PATH points to a location that contains these libraries. libpt_ocl.so will load them automatically.
  2. It seems that the version of dlprim_core the libpt_ocl.so was compiled with and the actual loaded are different, maybe you built but not installed?
kakulo commented 2 years ago

Thanks for your quick response.

  1. You don't need to load libdlprim.so or libdlprim_core.so. the libpt_ocl.so is linked with them. Just make sure LD_LIBRARY_PATH points to a location that contains these libraries. libpt_ocl.so will load them automatically.

I see. I just want to check if I load the wrong libraries.

  1. It seems that the version of dlprim_core the libpt_ocl.so was compiled with and the actual loaded are different, maybe you built but not installed?

I tried again to reinstall pytorch_dlprim with the correct libraries but still get the same error.

This is the cmake script I used to compile pytorch_dlprim: cmake .. -DCMAKE_PREFIX_PATH=/home/guol678/newlibs/miniconda/envs/dlprim/lib/python3.10/site-packages/torch/share/cmake/Torch/ -DDLPRIM_INC=/home/guol678/newlibs/dlprim/include/ -DDLPRIM_LIB=/home/guol678/newlibs/dlprim/lib/ -DCMAKE_C_COMPILER=/home/guol678/newlibs/gcc-9.4/bin/gcc -DCMAKE_CXX_COMPILER=/home/guol678/newlibs/gcc-9.4/bin/g++ -DCMAKE_INCLUDE_PATH=/home/guol678/newlibs/pocl/include/ -DCMAKE_LIBRARY_PATH=/home/guol678/newlibs/pocl/lib/

And I tried torch.ops.load_library again,

>>> torch.ops.load_library("/home/guol678/newlibs/miniconda/envs/dlprim/lib/libsqlite3.so") >>> torch.ops.load_library("/home/guol678/newlibs/dlprim/lib/libdlprim_core.so") >>> torch.ops.load_library("/home/guol678/newlibs/miniconda/envs/dlprim/lib/libsqlite3.so") >>> torch.ops.load_library("/home/guol678/torchOCL/pytorch_dlprim/build/libpt_ocl.so") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/guol678/newlibs/miniconda/envs/dlprim/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library ctypes.CDLL(path) File "/home/guol678/newlibs/miniconda/envs/dlprim/lib/python3.10/ctypes/__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: /home/guol678/torchOCL/pytorch_dlprim/build/libpt_ocl.so: undefined symbol: _ZN6dlprim4core18activation_forwardERNS_6TensorES2_NS_19StandardActivationsERKNS_16ExecutionContextE

artyom-beilis commented 1 year ago

There are some issues related to build ABI. I moved to pytorch nightly that gives way better out of tree support.

Look at this branch true_out_of_tree_support - it is still experimental and some nets fail but the build process is much simpler