/usr/lib64/libc.so.6 cause a mxnet segmentation.

TriLoo commented 4 years ago

Description

when using cpp-package to do inference, a dmlc::Error arose.

Error Message

the output of bt in gdb:

#0  0x00007fe0287f65f7 in raise () from /usr/lib64/libc.so.6
#1  0x00007fe0287f7ce8 in abort () from /usr/lib64/libc.so.6
#2  0x00007fe0290fb9d5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#3  0x00007fe0290f9946 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x00007fe0290f8909 in ?? () from /usr/lib64/libstdc++.so.6
#5  0x00007fe0290f9574 in __gxx_personality_v0 () from /usr/lib64/libstdc++.so.6
#6  0x00007fe028b92903 in ?? () from /usr/lib64/libgcc_s.so.1
#7  0x00007fe028b92c9b in _Unwind_RaiseException () from /usr/lib64/libgcc_s.so.1
#8  0x00007fe0290f9b86 in __cxa_throw () from /usr/lib64/libstdc++.so.6
#9  0x00007fe035a5c148 in void mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*) () from /search/odin/songminghui/githubs/incubator-mxnet/lib/libmxnet.so
#10 0x00007fe035a5d163 in mshadow::Stream<mshadow::gpu>* mshadow::NewStream<mshadow::gpu>(bool, bool, int) () from /search/odin/songminghui/githubs/incubator-mxnet/lib/libmxnet.so
#11 0x00007fe035a73bbf in void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&) ()
   from /search/odin/songminghui/githubs/incubator-mxnet/lib/libmxnet.so
#12 0x00007fe035a73e0e in std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&) () from /search/odin/songminghui/githubs/incubator-mxnet/lib/libmxnet.so
#13 0x00007fe035a6f0ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run() () from /search/odin/songminghui/githubs/incubator-mxnet/lib/libmxnet.so
#14 0x00007fe04279649f in execute_native_thread_routine () from /search/odin/songminghui/githubs/incubator-mxnet/lib/libmxnet.so
#15 0x00007fe0293aadc5 in start_thread () from /usr/lib64/libpthread.so.0
#16 0x00007fe0288b7ced in clone () from /usr/lib64/libc.so.6

To Reproduce

Steps to reproduce

(Paste the commands you ran that produced the error.)

download the mxnet, commit head: de510582438ad5fad576eba1b85c845b0ba9989c
build it with the cmake, and mkl, mkldnn enabled
run an own cpp inference code, then this error arise

What have you tried to solve it?

Change the gcc/g++ to same as that used by mxnet, i.e. gcc 7.3.1, not work

Environment

centos 7.2.1511
gcc/g++: 7.3.1
cuda: 10.0

----------Python Info----------
Version      : 3.6.8
Compiler     : GCC 7.3.0
Build        : ('default', 'Dec 30 2018 01:22:34')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /search/odin/songminghui/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 2.0.0
Directory    : /search/odin/songminghui/githubs/incubator-mxnet/python/mxnet
Num GPUs     : 8
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-3.10.0-327.el7.x86_64-x86_64-with-centos-7.2.1511-Core
system       : Linux
node         : nmyjs_176_61
release      : 3.10.0-327.el7.x86_64
version      : #1 SMP Thu Nov 19 22:10:57 UTC 2015
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2195.104
BogoMIPS:              4398.47
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47
----------Network Test----------
Setting timeout: 10
Error open MXNet: https://github.com/apache/incubator-mxnet, <urlopen error timed out>, DNS finished in 1.2967002391815186 sec.
Error open GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, <urlopen error timed out>, DNS finished in 4.1484832763671875e-05 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 1.1775 sec, LOAD: 3.5363 sec.
Timing for D2L: http://d2l.ai, DNS: 0.4321 sec, LOAD: 0.8394 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.3948 sec, LOAD: 1.0763 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.4420 sec, LOAD: 2.9380 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.1293 sec, LOAD: 13.5263 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.47435927391052246 sec.

leezu commented 4 years ago

Did you call MXNotifyShutdown();? cf https://github.com/apache/incubator-mxnet/commit/afb750570e71545b7b58d7374f586b21156d36d8