Both PyTorch and Caffe2 contain ATen library. If we don't link them carefully, we may run into some issues.

For example, here are some stack traces:

Thread 1 "python" hit Catchpoint 1 (exception thrown), 0x00007fffeca15c1d in __cxa_throw () from /lib64/libstdc++.so.6 (gdb) bt

0 0x00007fffeca15c1d in __cxa_throw () from /lib64/libstdc++.so.6

1 0x00007ffeffe81a31 in at::runtime_error(char const*, ...) () from /home/lufang/gitrepos/onnx-pytorch/pytorch/torch/lib/libATen.so.1

2 0x00007ffeffe815e2 in at::UndefinedTensor::storage() () from /home/lufang/gitrepos/onnx-pytorch/pytorch/torch/lib/libATen.so.1

3 0x00007fff851894e3 in std::_Tuple_impl<0ul, at::Tensor, at::Tensor>::~_Tuple_impl() () from /home/lufang/programs/caffe2/lib/libcaffe2_gpu.so

4 0x00007fff1081d838 in std::tuple<at::Tensor, at::Tensor>::~tuple (this=0x7fffffffc720, __in_chrg=) at /usr/include/c++/4.8.2/tuple:523

5 torch::autograd::VariableType::max_pool2d (this=0x1ee4830, self=..., kernel_size=..., stride=..., padding=..., dilation=..., ceil_mode=false) at torch/csrc/autograd/generated/VariableType.cpp:6602

6 0x00007fff108b895f in at::max_pool2d (ceil_mode=false, dilation=..., padding=..., stride=..., kernel_size=..., self=...) at /home/lufang/gitrepos/onnx-pytorch/pytorch/torch/lib/tmp_install/include/ATen/Functions.h:2038

7 torch::autograd::dispatch_max_pool2d (ceil_mode=false, dilation=..., padding=..., stride=..., kernel_size=..., self=...) at torch/csrc/autograd/generated/python_nn_functions_dispatch.h:140

Apparently, some function implementations related to ATen from Caffe2 are exposed. And PyTorch accidentally loaded and invoked it.

One temporary solution (given by @dzhulgakov) is remove "cytpes.RTLD_GLOBAL" flag from extension_loader.py in Caffe2. Not sure whether this will break other parts of Caffe2.

I create this issue to track this problem until it gets fixed.

cc: @soumith @ezyang @zdevito @dzhulgakov @bddppq

ezyang / onnx-pytorch

How to expose the shared libraries #44