cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

`shmget` failed causing SegFault #256

Closed chhzh123 closed 4 years ago

chhzh123 commented 4 years ago

This bug may appear on some specific machines when using Vivado HLS as backend. All the programs with target.config(compile="vivado_hls") may not be successfully compiled.

The trace from gdb is shown below.

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:455
455     ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) backtrace
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:455
#1  0x00007fff78394a2b in TVM::runtime::GenSharedMem(TVM::runtime::TVMArgs&, std::vector<int, std::allocator<int> >&, std::vector<unsigned long, std::allocator<unsigned long> >&) () from /home/chz/heterocl/tvm/lib/libhcl.so
#2  0x00007fff78375b85 in TVM::runtime::SimModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<TVM::runtime::ModuleNode> const&)::{lambda(TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*)#1}::operator()(TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*) const () from /home/chz/heterocl/tvm/lib/libhcl.so
#3  0x00007fff78376764 in std::_Function_handler<void (TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*), TVM::runtime::SimModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<TVM::runtime::ModuleNode> const&)::{lambda(TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, TVM::runtime::TVMArgs&&, TVM::runtime::TVMRetValue*&&) ()
   from /home/chz/heterocl/tvm/lib/libhcl.so
#4  0x00007fff785ea202 in TVMFuncCall () from /home/chz/heterocl/tvm/lib/libhcl.so

This bug is caused by copying data to unallocated memory. As shown below, shmget fails to allocate shared memory somehow and returns error code -1, which is not caught by HeteroCL and causes further crash.

https://github.com/cornell-zhang/heterocl/blob/580a3625a3ade2608be9b69272e92e0034f7fc35/tvm/src/codegen/build_util.cc#L207-L213

However, after I change the path of ftok to other folders, the program can run without SegFault. I do not clearly know what is the reason here.

chhzh123 commented 4 years ago

I use the current path to replace "/" in ftok, and this issue is fixed in #253 .