artyom-beilis / pytorch_dlprim

DLPrimitives/OpenCL out of tree backend for pytorch
http://blog.dlprimitives.org/
MIT License
264 stars 17 forks source link

Test fails due to undefined symbols #9

Closed dagelf closed 1 year ago

dagelf commented 1 year ago

I am documenting my journey trying to get this to run on Ubuntu 22 with AMD Radeon (TM) RX 480 Graphics (polaris10, LLVM 13.0.1, DRM 3.46, 5.15.0-47-generic)

python mnist.py --device opencl:0
Traceback (most recent call last):
  File "/home/coenraad/pytorch_dlprim/mnist.py", line 152, in <module>
    main()
  File "/home/coenraad/pytorch_dlprim/mnist.py", line 117, in main
    torch.ops.load_library("build/libpt_ocl.so")
  File "/home/coenraad/.local/lib/python3.10/site-packages/torch/_ops.py", line 255, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/coenraad/pytorch_dlprim/build/libpt_ocl.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev
artyom-beilis commented 1 year ago

Are you using custom build pytorch from this repo? https://github.com/artyom-beilis/pytorch

dagelf commented 1 year ago

Yes, but without Anaconda. I am following that process now and am stuck here: https://github.com/pytorch/pytorch/issues/69894 (but on your repo)

The first build was with Python 3.10.4, just using CMake.

Anaconda's default is Python 3.9.12. I'll try to either try an older version of Python, or follow the instructions I did previously. Was busy documenting those steps here when we had a power cut.

dagelf commented 1 year ago
$ git clone --depth=1 https://github.com/artyom-beilis/pytorch && cd pytorch
$ conda create --name pytorch-py36
$ conda activate pytorch-py36
$ conda install python=3.6.13
$ conda install astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
$ USE_ROCM=0 python setup.py install

Ends with...

FAILED: third_party/ideep/mkl-dnn/src/common/CMakeFiles/dnnl_common.dir/primitive_cache.cpp.o 
/usr/bin/ccache /usr/bin/c++ -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_CPU_ISA_HINTS -DDNNL_ENABLE_ITT_TASKS -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I/home/coenraad/pytorch/cmake/../third_party/benchmark/include -I/home/coenraad/pytorch/build/caffe2/contrib/aten -I/home/coenraad/pytorch/third_party/onnx -I/home/coenraad/pytorch/build/third_party/onnx -I/home/coenraad/pytorch/third_party/foxi -I/home/coenraad/pytorch/build/third_party/foxi -I/home/coenraad/pytorch/third_party/ideep/mkl-dnn/include -I/home/coenraad/pytorch/build/third_party/ideep/mkl-dnn/include -I/home/coenraad/pytorch/third_party/ideep/mkl-dnn/src -isystem /home/coenraad/pytorch/build/third_party/gloo -isystem /home/coenraad/pytorch/cmake/../third_party/gloo -isystem /home/coenraad/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/coenraad/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/coenraad/pytorch/third_party/protobuf/src -isystem /home/coenraad/anaconda3/envs/pytorch-py6/include -isystem /home/coenraad/pytorch/third_party/gemmlowp -isystem /home/coenraad/pytorch/third_party/neon2sse -isystem /home/coenraad/pytorch/third_party/XNNPACK/include -isystem /home/coenraad/pytorch/third_party -isystem /home/coenraad/pytorch/cmake/../third_party/eigen -isystem /home/coenraad/anaconda3/envs/pytorch-py6/include/python3.6m -isystem /home/coenraad/anaconda3/envs/pytorch-py6/lib/python3.6/site-packages/numpy/core/include -isystem /home/coenraad/pytorch/cmake/../third_party/pybind11/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -fopenmp -fvisibility-inlines-hidden  -Wall -Wno-unknown-pragmas -fvisibility=internal -msse4 -fPIC -Wformat -Wformat-security -fstack-protector-strong -std=c++11  -Wmissing-field-initializers  -Wno-strict-overflow  -O3 -DNDEBUG -DNDEBUG -D_FORTIFY_SOURCE=2 -fPIC -DCAFFE2_USE_GLOO -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std=gnu++14 -MD -MT third_party/ideep/mkl-dnn/src/common/CMakeFiles/dnnl_common.dir/primitive_cache.cpp.o -MF third_party/ideep/mkl-dnn/src/common/CMakeFiles/dnnl_common.dir/primitive_cache.cpp.o.d -o third_party/ideep/mkl-dnn/src/common/CMakeFiles/dnnl_common.dir/primitive_cache.cpp.o -c /home/coenraad/pytorch/third_party/ideep/mkl-dnn/src/common/primitive_cache.cpp
/home/coenraad/pytorch/third_party/ideep/mkl-dnn/src/common/primitive_cache.cpp: In member function ‘virtual void dnnl::impl::lru_primitive_cache_t::update_entry(const key_t&, const dnnl::impl::primitive_desc_t*)’:
/home/coenraad/pytorch/third_party/ideep/mkl-dnn/src/common/primitive_cache.cpp:155:60: error: no match for ‘operator!=’ (operand types are ‘const std::thread::id’ and ‘const std::thread::id’)

I had the same issue with Python 3.10.4, which I manually patched:

-     if (it == cache_mapper_.end() || it->first.thread_id() != key.thread_id())
+     if (it == cache_mapper_.end() || !(it->first.thread_id() == key.thread_id()))

As well as

FAILED: third_party/breakpad/CMakeFiles/breakpad.dir/src/client/linux/handler/exception_handler.cc.o 
/usr/bin/ccache /usr/bin/c++ -DHAVE_A_OUT_H -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I/home/coenraad/pytorch/cmake/../third_party/benchmark/include -I/home/coenraad/pytorch/build/caffe2/contrib/aten -I/home/coenraad/pytorch/third_party/onnx -I/home/coenraad/pytorch/build/third_party/onnx -I/home/coenraad/pytorch/third_party/foxi -I/home/coenraad/pytorch/build/third_party/foxi -I/home/coenraad/pytorch/third_party/breakpad/src -I/home/coenraad/pytorch/third_party/breakpad/src/third_party/linux/include -isystem /home/coenraad/pytorch/build/third_party/gloo -isystem /home/coenraad/pytorch/cmake/../third_party/gloo -isystem /home/coenraad/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/coenraad/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/coenraad/pytorch/third_party/protobuf/src -isystem /home/coenraad/anaconda3/envs/pytorch-py6/include -isystem /home/coenraad/pytorch/third_party/gemmlowp -isystem /home/coenraad/pytorch/third_party/neon2sse -isystem /home/coenraad/pytorch/third_party/XNNPACK/include -isystem /home/coenraad/pytorch/third_party -isystem /home/coenraad/pytorch/cmake/../third_party/eigen -isystem /home/coenraad/anaconda3/envs/pytorch-py6/include/python3.6m -isystem /home/coenraad/anaconda3/envs/pytorch-py6/lib/python3.6/site-packages/numpy/core/include -isystem /home/coenraad/pytorch/cmake/../third_party/pybind11/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /home/coenraad/pytorch/third_party/ideep/mkl-dnn/include -isystem /home/coenraad/pytorch/third_party/ideep/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -std=gnu++14 -MD -MT third_party/breakpad/CMakeFiles/breakpad.dir/src/client/linux/handler/exception_handler.cc.o -MF third_party/breakpad/CMakeFiles/breakpad.dir/src/client/linux/handler/exception_handler.cc.o.d -o third_party/breakpad/CMakeFiles/breakpad.dir/src/client/linux/handler/exception_handler.cc.o -c /home/coenraad/pytorch/third_party/breakpad/src/client/linux/handler/exception_handler.cc
/home/coenraad/pytorch/third_party/breakpad/src/client/linux/handler/exception_handler.cc: In function ‘void google_breakpad::{anonymous}::InstallAlternateStackLocked()’:
/home/coenraad/pytorch/third_party/breakpad/src/client/linux/handler/exception_handler.cc:141:49: error: no matching function for call to ‘max(int, long int)’

Patched with

-  static const unsigned kSigStackSize = std::max(16384, SIGSTKSZ);
+  static const unsigned kSigStackSize = std::max((long)16384, SIGSTKSZ);
dagelf commented 1 year ago
[867/2871] Generating ../../../include/sleef.h
Generating sleef.h: mkrename cinz_ 2 4 __m128d __m128 __m128i __m128i __SSE2__
Generating sleef.h: mkrename cinz_ 2 4 __m128d __m128 __m128i __m128i __SSE2__ sse2
Generating sleef.h: mkrename cinz_ 2 4 __m128d __m128 __m128i __m128i __SSE2__ sse4
Generating sleef.h: mkrename cinz_ 4 8 __m256d __m256 __m128i struct\ {\ __m128i\ x,\ y;\ } __AVX__
Generating sleef.h: mkrename cinz_ 4 8 __m256d __m256 __m128i struct\ {\ __m128i\ x,\ y;\ } __AVX__ avx
Generating sleef.h: mkrename finz_ 4 8 __m256d __m256 __m128i struct\ {\ __m128i\ x,\ y;\ } __AVX__ fma4
Generating sleef.h: mkrename finz_ 4 8 __m256d __m256 __m128i __m256i __AVX__ avx2
Generating sleef.h: mkrename finz_ 2 4 __m128d __m128 __m128i __m128i __SSE2__ avx2128
Generating sleef.h: mkrename finz_ 8 16 __m512d __m512 __m256i __m512i __AVX512F__
Generating sleef.h: mkrename finz_ 8 16 __m512d __m512 __m256i __m512i __AVX512F__ avx512f
Generating sleef.h: mkrename cinz_ 8 16 __m512d __m512 __m256i __m512i __AVX512F__ avx512fnofma
Generating sleef.h: mkrename cinz_ 1 1 double float int32_t int32_t __STDC__ purec
Generating sleef.h: mkrename finz_ 1 1 double float int32_t int32_t FP_FAST_FMA purecfma
[1335/2871] Building C object caffe2/CMakeFiles/torch_cpu.dir/__/third_party/miniz-2.0.8/miniz.c.o
/home/coenraad/pytorch/third_party/miniz-2.0.8/miniz.c:3108:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’
 3108 | #pragma message("Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.")
      |         ^~~~~~~
[1896/2871] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o 
/usr/bin/ccache /usr/bin/c++ -DADD_BREAKPAD_SIGNAL_HANDLER -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/coenraad/pytorch/build/aten/src -I/home/coenraad/pytorch/aten/src -I/home/coenraad/pytorch/build -I/home/coenraad/pytorch -I/home/coenraad/pytorch/cmake/../third_party/benchmark/include -I/home/coenraad/pytorch/build/caffe2/contrib/aten -I/home/coenraad/pytorch/third_party/onnx -I/home/coenraad/pytorch/build/third_party/onnx -I/home/coenraad/pytorch/third_party/foxi -I/home/coenraad/pytorch/build/third_party/foxi -I/home/coenraad/pytorch/torch/csrc/api -I/home/coenraad/pytorch/torch/csrc/api/include -I/home/coenraad/pytorch/caffe2/aten/src/TH -I/home/coenraad/pytorch/build/caffe2/aten/src/TH -I/home/coenraad/pytorch/build/caffe2/aten/src -I/home/coenraad/pytorch/caffe2/../third_party -I/home/coenraad/pytorch/caffe2/../third_party/breakpad/src -I/home/coenraad/pytorch/build/caffe2/../aten/src -I/home/coenraad/pytorch/build/caffe2/../aten/src/ATen -I/home/coenraad/pytorch/torch/csrc -I/home/coenraad/pytorch/third_party/miniz-2.0.8 -I/home/coenraad/pytorch/third_party/kineto/libkineto/include -I/home/coenraad/pytorch/third_party/kineto/libkineto/src -I/home/coenraad/pytorch/torch/csrc/distributed -I/home/coenraad/pytorch/aten/src/TH -I/home/coenraad/pytorch/aten/../third_party/catch/single_include -I/home/coenraad/pytorch/aten/src/ATen/.. -I/home/coenraad/pytorch/build/caffe2/aten/src/ATen -I/home/coenraad/pytorch/caffe2/core/nomnigraph/include -I/home/coenraad/pytorch/third_party/FXdiv/include -I/home/coenraad/pytorch/c10/.. -I/home/coenraad/pytorch/build/third_party/ideep/mkl-dnn/include -I/home/coenraad/pytorch/third_party/ideep/mkl-dnn/src/../include -I/home/coenraad/pytorch/third_party/pthreadpool/include -I/home/coenraad/pytorch/third_party/cpuinfo/include -I/home/coenraad/pytorch/third_party/QNNPACK/include -I/home/coenraad/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/home/coenraad/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/home/coenraad/pytorch/third_party/cpuinfo/deps/clog/include -I/home/coenraad/pytorch/third_party/NNPACK/include -I/home/coenraad/pytorch/third_party/fbgemm/include -I/home/coenraad/pytorch/third_party/fbgemm -I/home/coenraad/pytorch/third_party/fbgemm/third_party/asmjit/src -I/home/coenraad/pytorch/third_party/FP16/include -I/home/coenraad/pytorch/third_party/tensorpipe -I/home/coenraad/pytorch/build/third_party/tensorpipe -I/home/coenraad/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/coenraad/pytorch/third_party/fmt/include -isystem /home/coenraad/pytorch/build/third_party/gloo -isystem /home/coenraad/pytorch/cmake/../third_party/gloo -isystem /home/coenraad/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/coenraad/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/coenraad/pytorch/third_party/protobuf/src -isystem /home/coenraad/anaconda3/envs/pytorch-py6/include -isystem /home/coenraad/pytorch/third_party/gemmlowp -isystem /home/coenraad/pytorch/third_party/neon2sse -isystem /home/coenraad/pytorch/third_party/XNNPACK/include -isystem /home/coenraad/pytorch/third_party -isystem /home/coenraad/pytorch/cmake/../third_party/eigen -isystem /home/coenraad/anaconda3/envs/pytorch-py6/include/python3.6m -isystem /home/coenraad/anaconda3/envs/pytorch-py6/lib/python3.6/site-packages/numpy/core/include -isystem /home/coenraad/pytorch/cmake/../third_party/pybind11/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /home/coenraad/pytorch/third_party/ideep/mkl-dnn/include -isystem /home/coenraad/pytorch/third_party/ideep/include -isystem /home/coenraad/pytorch/build/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o -c /home/coenraad/pytorch/torch/csrc/jit/frontend/ir_emitter.cpp
/home/coenraad/pytorch/torch/csrc/jit/frontend/ir_emitter.cpp: In lambda function:
/home/coenraad/pytorch/torch/csrc/jit/frontend/ir_emitter.cpp:1681:76: error: ‘this’ pointer is null [-Werror=nonnull]
 1681 |               << " elements, which were unified to " << candidate->repr_str();
      |                                                         ~~~~~~~~~~~~~~~~~~~^~
cc1plus: some warnings being treated as errors

The fix is to test if it's null before printing. But it can also just be commented out for now, and ; added, because it seems to just be code to print an error message.

artyom-beilis commented 1 year ago

Recently PyTorch made it simpler to create custom backend: https://dev-discuss.pytorch.org/t/private-use-opencl-device/731/2

So once I update the backend I should be able to work with vanilla pytorch without need to build one from scratch.

I hope I'll get to it this week. Other than that try to build pytorch. I use Ubuntu 18.04 and Python 3.6... In current code requires custom modifications in torch.

dagelf commented 1 year ago

Build succeeded. Same... but this is on Clover. My GPU is in a grey area with ROCm support.... but will try with the ROCm, Ubuntu 18, and will do a deep dive into PyTorch following your lead. All I can say is THANK YOU. It's very exciting that being able to use my own hardware at long last, is in sight.

$ python mnist.py --device opencl:1
Traceback (most recent call last):
  File "mnist.py", line 152, in <module>
    main()
  File "mnist.py", line 117, in main
    torch.ops.load_library("build/libpt_ocl.so")
  File "/home/coenraad/anaconda3/envs/pytorch-py6/lib/python3.6/site-packages/torch/_ops.py", line 110, in load_library
    ctypes.CDLL(path)
  File "/home/coenraad/anaconda3/envs/pytorch-py6/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/coenraad/pytorch_dlprim/build/libpt_ocl.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev
(pytorch-py36) ~/pytorch_dlprim$ 
artyom-beilis commented 1 year ago

Check this python -c "import torch; print(torch._C._GLIBCXX_USE_CXX11_ABI)

I'm looking into https://discuss.pytorch.org/t/how-to-fix-c-abi-problem-for-torchvision/126045/2 this. Maybe some flag needed for pytorch backend to be added to make it compatible with torch build.

artyom-beilis commented 1 year ago

. My GPU is in a grey area with ROCm support.

As long as it on primary PCI-E it should work (not via chipset)

I actually tested dlprimitives on rx560 witch is from same family with fairly good results.

ROCm driver works also I don't think their framework support it any more - i.e. you have problems running latest torch/tensorflow on it. I even Clover/Mesa drivers worked.

Now I have gtx 960 and rx 6600 in my PC. Unfortunately I have no 3rd PCI-E slot to insert rx560 - but in past it worked well.

dagelf commented 1 year ago

So what's missing from the library seems to be some variation of torch::autograd::Node::name[abi:cxx11]() const when running torch.ops.load_library("pytorch_dlprim/build/libpt_ocl.so")

(demangled _ZNK5torch8autograd4Node4nameB5cxx11Ev with https://github.com/juchem/demangle)

Correction... maybe the pytorch build is missing something.

~/pytorch$ grep torch::autograd::Node * -R 
aten/src/ATen/core/Tensor.cpp:const std::shared_ptr<torch::autograd::Node>& TensorBase::grad_fn() const {
aten/src/ATen/core/VariableHooksInterface.h:  virtual const std::shared_ptr<torch::autograd::Node>& grad_fn(const TensorBase&) const = 0;
aten/src/ATen/core/TensorBase.h:  const std::shared_ptr<torch::autograd::Node>& grad_fn() const;
build/lib.linux-x86_64-3.6/torch/include/ATen/core/VariableHooksInterface.h:  virtual const std::shared_ptr<torch::autograd::Node>& grad_fn(const TensorBase&) const = 0;
build/lib.linux-x86_64-3.6/torch/include/ATen/core/TensorBase.h:  const std::shared_ptr<torch::autograd::Node>& grad_fn() const;
build/lib.linux-x86_64-3.6/torch/include/c10d/reducer.hpp:  std::vector<std::shared_ptr<torch::autograd::Node>>
build/lib.linux-x86_64-3.6/torch/include/c10d/reducer.hpp:  std::unordered_map<torch::autograd::Node*, size_t> gradAccToVariableMap_;
build/lib.linux-x86_64-3.6/torch/include/c10d/reducer.hpp:  std::vector<std::pair<uintptr_t, std::shared_ptr<torch::autograd::Node>>>
tools/autograd/gen_autograd_functions.py:    These contain the auto-generated subclasses of torch::autograd::Node
grep: tools/autograd/__pycache__/gen_autograd_functions.cpython-36.pyc: binary file matches
tools/autograd/gen_autograd.py:#  gen_autograd_functions.py: generates subclasses of torch::autograd::Node
torch/csrc/autograd/variable.cpp:  const std::shared_ptr<torch::autograd::Node>& grad_fn(const at::TensorBase&) const override;
torch/csrc/autograd/variable.cpp:  std::shared_ptr<torch::autograd::Node> singleton_shared_ptr;
torch/csrc/autograd/variable.cpp:const std::shared_ptr<torch::autograd::Node>& VariableHooks::grad_fn(const at::TensorBase& self) const {
torch/csrc/distributed/autograd/engine/dist_engine.cpp:using torch::autograd::Node;
torch/csrc/distributed/autograd/engine/dist_engine.cpp:using torch::autograd::NodeTask;
torch/csrc/distributed/autograd/engine/dist_engine.h:      const std::shared_ptr<torch::autograd::Node>& graphRoot,
torch/csrc/distributed/autograd/engine/dist_engine.h:      torch::autograd::NodeTask&& node_task,
torch/csrc/distributed/autograd/engine/dist_engine.h:      const std::shared_ptr<torch::autograd::Node>& graphRoot,
torch/csrc/distributed/autograd/functions/recvrpc_backward.h:class TORCH_API RecvRpcBackward : public torch::autograd::Node {
torch/csrc/distributed/autograd/functions/sendrpc_backward.h:struct TORCH_API SendRpcBackward : public torch::autograd::Node {
torch/csrc/distributed/c10d/reducer.cpp:  std::unordered_set<torch::autograd::Node*> seen;
torch/csrc/distributed/c10d/reducer.cpp:  std::vector<torch::autograd::Node*> queue;
torch/csrc/distributed/c10d/reducer.hpp:  std::vector<std::shared_ptr<torch::autograd::Node>>
torch/csrc/distributed/c10d/reducer.hpp:  std::unordered_map<torch::autograd::Node*, size_t> gradAccToVariableMap_;
torch/csrc/distributed/c10d/reducer.hpp:  std::vector<std::pair<uintptr_t, std::shared_ptr<torch::autograd::Node>>>
torch/include/ATen/core/VariableHooksInterface.h:  virtual const std::shared_ptr<torch::autograd::Node>& grad_fn(const TensorBase&) const = 0;
torch/include/ATen/core/TensorBase.h:  const std::shared_ptr<torch::autograd::Node>& grad_fn() const;
torch/include/torch/csrc/distributed/autograd/engine/dist_engine.h:      const std::shared_ptr<torch::autograd::Node>& graphRoot,
torch/include/torch/csrc/distributed/autograd/engine/dist_engine.h:      torch::autograd::NodeTask&& node_task,
torch/include/torch/csrc/distributed/autograd/engine/dist_engine.h:      const std::shared_ptr<torch::autograd::Node>& graphRoot,
torch/include/torch/csrc/distributed/autograd/functions/recvrpc_backward.h:class TORCH_API RecvRpcBackward : public torch::autograd::Node {
torch/include/torch/csrc/distributed/autograd/functions/sendrpc_backward.h:struct TORCH_API SendRpcBackward : public torch::autograd::Node {
torch/include/c10d/reducer.hpp:  std::vector<std::shared_ptr<torch::autograd::Node>>
torch/include/c10d/reducer.hpp:  std::unordered_map<torch::autograd::Node*, size_t> gradAccToVariableMap_;
torch/include/c10d/reducer.hpp:  std::vector<std::pair<uintptr_t, std::shared_ptr<torch::autograd::Node>>>
$ grep ops.load_library * -R
benchmarks/operator_benchmark/pt/qembedding_bag_lookups_test.py:torch.ops.load_library("//caffe2/torch/fb/sparsenn:sparsenn_operators")
benchmarks/operator_benchmark/pt/clip_ranges_test.py:torch.ops.load_library("//caffe2/torch/fb/sparsenn:sparsenn_operators")
build/lib.linux-x86_64-3.6/torch/_ops.py:        call ``torch.ops.load_library('path/to/libcustom.so')`` to load the
build/lib.linux-x86_64-3.6/torch/_classes.py:        torch.ops.load_library(path)
build/lib.linux-x86_64-3.6/torch/utils/cpp_extension.py:        torch.ops.load_library(filepath)
test/test_fx.py:            torch.ops.load_library(str(lib_file_path))
test/custom_operator/test_custom_ops.py:        ops.load_library(self.library_path)
test/custom_operator/test_custom_classes.py:        ops.load_library(get_custom_class_library_path())
test/custom_operator/model.py:    torch.ops.load_library(get_custom_op_library_path())
test/cpp/jit/CMakeLists.txt:# These are intended to be used with torch.ops.load_library() in our Python test suite.
test/custom_backend/backend.py:    torch.ops.load_library(library_path)
test/custom_backend/test_custom_backend.py:        torch.ops.load_library(self.library_path)
test/jit/test_torchbind.py:        torch.ops.load_library(str(lib_file_path))
test/jit/test_backends.py:        torch.ops.load_library(str(lib_file_path))
test/jit/test_backends.py:        torch.ops.load_library(str(lib_file_path))
test/jit/test_backend_nnapi.py:        torch.ops.load_library(str(lib_path))
test/test_bundled_images.py:torch.ops.load_library("//caffe2/torch/fb/operators:decode_bundled_image")
third_party/fbgemm/fbgemm_gpu/test/quantize_ops_test.py:    torch.ops.load_library("fbgemm_gpu_py.so")
third_party/fbgemm/fbgemm_gpu/test/quantize_ops_test.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:sparse_ops")
third_party/fbgemm/fbgemm_gpu/test/quantize_ops_test.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:sparse_ops_cpu")
third_party/fbgemm/fbgemm_gpu/test/merge_pooled_embeddings_test.py:    torch.ops.load_library("fbgemm_gpu_py.so")
third_party/fbgemm/fbgemm_gpu/test/merge_pooled_embeddings_test.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:merge_pooled_embeddings")
third_party/fbgemm/fbgemm_gpu/test/merge_pooled_embeddings_test.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:merge_pooled_embeddings_cpu")
third_party/fbgemm/fbgemm_gpu/test/sparse_ops_test.py:    torch.ops.load_library("fbgemm_gpu_py.so")
third_party/fbgemm/fbgemm_gpu/test/sparse_ops_test.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:sparse_ops")
third_party/fbgemm/fbgemm_gpu/test/sparse_ops_test.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:sparse_ops_cpu")
third_party/fbgemm/fbgemm_gpu/bench/merge_embeddings_benchmark.py:    torch.ops.load_library("fbgemm_gpu_py.so")
third_party/fbgemm/fbgemm_gpu/bench/merge_embeddings_benchmark.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:merge_pooled_embeddings")
third_party/fbgemm/fbgemm_gpu/bench/merge_embeddings_benchmark.py:    torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:merge_pooled_embeddings_cpu")
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops")
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops_cpu")
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:cumem_utils")
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:sparse_ops")
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library("//deeplearning/fbgemm/fbgemm_gpu:sparse_ops_cpu")
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library(
third_party/fbgemm/fbgemm_gpu/codegen/split_embedding_codegen_lookup_invoker.template:torch.ops.load_library("fbgemm_gpu_py.so")
third_party/fbgemm/fbgemm_gpu/fbgemm_gpu/split_embedding_inference_converter.py:torch.ops.load_library("//caffe2/torch/fb/sparsenn:sparsenn_operators")
torch/csrc/deploy/example/examples.py:    torch.ops.load_library("my_so.so")
torch/_ops.py:        call ``torch.ops.load_library('path/to/libcustom.so')`` to load the
torch/_classes.py:        torch.ops.load_library(path)
torch/utils/cpp_extension.py:        torch.ops.load_library(filepath)
danielzgtg commented 1 year ago

I encountered this problem too. I was building off of master with the one commit cherry-picked.

The problem was that I had a residual PyTorch installation from a previous install. I had to run pip uninstall torch twice for it to remove the old header files. Then I ran python3 setup.py install in PyTorch and now the cmake and make started working.

artyom-beilis commented 1 year ago

The problem was that I had a residual PyTorch installation from a previous install...

Can you please give more details and your steps.

Thanks!

artyom-beilis commented 1 year ago

Ok I see there was a change in pytorch:

# When we build libtorch with the old GCC ABI, dependent libraries must too.                 |  # When we build libtorch with the old libstdc++ ABI, dependent libraries must too.          
  if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")                                                |  if(CMAKE_SYSTEM_NAME STREQUAL "Linux")                                                      
    set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=1")                                          |    set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=0")                                         
  endif()                                                                                      |  endif()

At some point CXX11_ABI was for some reason replaced with older ABI...

artyom-beilis commented 1 year ago

And in my custom build CXX11_ABI was set to 1...

I need to be aligned with pytorch build and build dlprim with same ABI :-(

artyom-beilis commented 1 year ago

Started working on better out of tree backend support - at least you don't need to build pytorch and it makes things easier

See this branch true_out_of_tree_support