Open stevenrbrandt opened 5 years ago
If I disable odr violation detection, I get this from physl --help
...
=100==Processing thread 54.
==100==Stack at 0x7ffddf26c000-0x7ffddfa6c000 (SP = 0x7ffddfa6a7c8).
==100==TLS at 0x7f6697fe7b40-0x7f6697fe8c40.
Tracer caught signal 11: addr=0x62700000f000 pc=0x7f66a5cff868 sp=0x7f6678534c40
==54==LeakSanitizer has encountered a fatal error.
==54==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==54==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
I think the reported ODR violation is benign in our context, I will have a look, though. No idea why it crashes however. What does it report if you set the recommended options (LSAN_OPTIONS=verbosity=1:log_threads=1
)?
I had to set those options to see the signal 11. It just prints that message regardless.
@stevenrbrandt thanks! The output however is not too useful :/ Is there anything reported otherwise that might shed some light on what's going on?
@hkaiser I made a new version of the sanitizer image with the llvm-symbolizer. Alas, it told me nothing more. addr2line
for the address reported ??:0
.
A slightly larger physl program,
block(
define(fib,n
if(n < 2,n,
fib(n-1)+fib(n-2)
)),
cout(fib(3)))
Produced a more coherent error message
==158==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7f6d18e3df00; bottom 0x7f6d072dc000; size: 0x000011b61f00 (297148160)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
=================================================================
==158==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f6d072e6628 at pc 0x7f6d4719cf41 bp 0x7f6d072e64f0 sp 0x7f6d072e64e8
WRITE of size 8 at 0x7f6d072e6628 thread T10
#0 0x7f6d4719cf40 in boost::spirit::qi::detail::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type>::fail_function(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >&, boost::spirit::unused_type const&) /usr/include/boost/spirit/home/qi/detail/fail_function.hpp:28:13
#1 0x7f6d4719aa43 in boost::spirit::qi::detail::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type> boost::spirit::qi::sequence<boost::fusion::cons<boost::spirit::qi::literal_string<char const (&) [3], true>, boost::fusion::cons<boost::spirit::qi::kleene<boost::spirit::qi::difference<boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::char_, boost::spirit::char_encoding::standard> >, boost::spirit::qi::literal_string<char const (&) [3], true> > >, boost::fusion::cons<boost::spirit::qi::literal_string<char const (&) [3], true>, boost::fusion::nil_> > > >::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type>(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >&, boost::spirit::unused_type const&) /usr/include/boost/spirit/home/qi/operator/sequence.hpp:51:20
...
However, the "false positives" warning has me concerned that maybe ASAN can't help us because of stack switching.
@sithhell is that error message above something you have seen with your asan runs?
@sithhell , Hartmut tells me you don't see the problems I've seen. Here's how I set up the sanitizer: https://gist.github.com/stevenrbrandt/131d89d1a6bf99810bc56394818bf3c1
I'd be interested in knowing what I've done wrong. :)
@hkaiser @sithhell I ran the fib(3) code on a machine with 40 cores (80 threads). I decided to try on Rostam, and found that the stack-buffer-overflow errors go away. The segfault on shutdown is still there, though.
@stevenrbrandt the stack switching is perfectly fine with asan, there are no false positives there. The errors you are seeing are genuine bugs on our side. If they are severe or not is a different issue. Spirit causing stack overflows makes perfect sense given its recursive decent parsing nature. With that being said, the stack overflow probably depends on how many recursive function calls there actually are. The stack traces coming out of ASAN are always very helpful and very precise. So this should give us some idea. I haven't seen problems because I haven't run any complicated physl code with asan yet. The stack overflow bug is most likely going to go away if you increase the stack size.
@sithhell The issue appears related to something in my environment, not the number of threads on the machine, as running that same image with Singularity did not produce an issue. Regardless, though, I always see the segfault at shutdown with ASAN.
@stevenrbrandt right, the only way to get rid of the segfault right now is to disable the leaksanitizer by setting the environment ASAN_OPTIONS=detect_leaks=0
. The address sanitizer features like heap use after free or stack overflows are still enabled with that.
@stevenrbrandt if you configure HPX with -DHPX_WITH_STACKOVERFLOW_DETECTION=Off
, the leak sanitizer segfault goes away.
DHPX_WITH_STACKOVERFLOW_DETECTION
can you please share about, how can I config HPX on a docker