STEllAR-GROUP / phylanx

An Asynchronous Distributed C++ Array Processing Toolkit
Boost Software License 1.0
75 stars 76 forks source link

physl running with the address sanitizer #759

Open stevenrbrandt opened 5 years ago

stevenrbrandt commented 5 years ago
bash-4.4$ physl --help
=================================================================
==6==ERROR: AddressSanitizer: odr-violation (0x7f9db34ff8c0):
  [1] size=32 'hpx::util::detail::global_fixture' /hpx/src/util/lightweight_test.cpp:56:13
  [2] size=32 'hpx::util::detail::global_fixture' /hpx/src/util/lightweight_test.cpp:56:13
These globals were registered at these points:
  [1]:
    #0 0x7f9dc4620660  (/usr/lib64/clang/7.0.1/lib/libclang_rt.asan-x86_64.so+0x62660)
    #1 0x7f9db2b9a48d  (/usr/local/lib/phylanx/libphylanx_controlsd.so+0x86948d)

  [2]:
    #0 0x7f9dc4620660  (/usr/lib64/clang/7.0.1/lib/libclang_rt.asan-x86_64.so+0x62660)
    #1 0x7f9db33b094d  (/usr/local/lib/phylanx/libphylanx_statisticsd.so+0x61494d)

==6==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'hpx::util::detail::global_fixture' at /hpx/src/util/lightweight_test.cpp:56:13
==6==ABORTING
stevenrbrandt commented 5 years ago

If I disable odr violation detection, I get this from physl --help

...
=100==Processing thread 54.
==100==Stack at 0x7ffddf26c000-0x7ffddfa6c000 (SP = 0x7ffddfa6a7c8).
==100==TLS at 0x7f6697fe7b40-0x7f6697fe8c40.
Tracer caught signal 11: addr=0x62700000f000 pc=0x7f66a5cff868 sp=0x7f6678534c40
==54==LeakSanitizer has encountered a fatal error.
==54==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==54==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
hkaiser commented 5 years ago

I think the reported ODR violation is benign in our context, I will have a look, though. No idea why it crashes however. What does it report if you set the recommended options (LSAN_OPTIONS=verbosity=1:log_threads=1)?

stevenrbrandt commented 5 years ago

I had to set those options to see the signal 11. It just prints that message regardless.

hkaiser commented 5 years ago

@stevenrbrandt thanks! The output however is not too useful :/ Is there anything reported otherwise that might shed some light on what's going on?

stevenrbrandt commented 5 years ago

@hkaiser I made a new version of the sanitizer image with the llvm-symbolizer. Alas, it told me nothing more. addr2line for the address reported ??:0.

stevenrbrandt commented 5 years ago

A slightly larger physl program,

block(
define(fib,n
  if(n < 2,n,
    fib(n-1)+fib(n-2)
  )),
cout(fib(3)))

Produced a more coherent error message

==158==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7f6d18e3df00; bottom 0x7f6d072dc000; size: 0x000011b61f00 (297148160)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
=================================================================
==158==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f6d072e6628 at pc 0x7f6d4719cf41 bp 0x7f6d072e64f0 sp 0x7f6d072e64e8
WRITE of size 8 at 0x7f6d072e6628 thread T10
    #0 0x7f6d4719cf40 in boost::spirit::qi::detail::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type>::fail_function(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >&, boost::spirit::unused_type const&) /usr/include/boost/spirit/home/qi/detail/fail_function.hpp:28:13
    #1 0x7f6d4719aa43 in boost::spirit::qi::detail::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type> boost::spirit::qi::sequence<boost::fusion::cons<boost::spirit::qi::literal_string<char const (&) [3], true>, boost::fusion::cons<boost::spirit::qi::kleene<boost::spirit::qi::difference<boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::char_, boost::spirit::char_encoding::standard> >, boost::spirit::qi::literal_string<char const (&) [3], true> > >, boost::fusion::cons<boost::spirit::qi::literal_string<char const (&) [3], true>, boost::fusion::nil_> > > >::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type>(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >&, boost::spirit::unused_type const&) /usr/include/boost/spirit/home/qi/operator/sequence.hpp:51:20
...

However, the "false positives" warning has me concerned that maybe ASAN can't help us because of stack switching.

hkaiser commented 5 years ago

@sithhell is that error message above something you have seen with your asan runs?

stevenrbrandt commented 5 years ago

@sithhell , Hartmut tells me you don't see the problems I've seen. Here's how I set up the sanitizer: https://gist.github.com/stevenrbrandt/131d89d1a6bf99810bc56394818bf3c1

I'd be interested in knowing what I've done wrong. :)

stevenrbrandt commented 5 years ago

@hkaiser @sithhell I ran the fib(3) code on a machine with 40 cores (80 threads). I decided to try on Rostam, and found that the stack-buffer-overflow errors go away. The segfault on shutdown is still there, though.

sithhell commented 5 years ago

@stevenrbrandt the stack switching is perfectly fine with asan, there are no false positives there. The errors you are seeing are genuine bugs on our side. If they are severe or not is a different issue. Spirit causing stack overflows makes perfect sense given its recursive decent parsing nature. With that being said, the stack overflow probably depends on how many recursive function calls there actually are. The stack traces coming out of ASAN are always very helpful and very precise. So this should give us some idea. I haven't seen problems because I haven't run any complicated physl code with asan yet. The stack overflow bug is most likely going to go away if you increase the stack size.

stevenrbrandt commented 5 years ago

@sithhell The issue appears related to something in my environment, not the number of threads on the machine, as running that same image with Singularity did not produce an issue. Regardless, though, I always see the segfault at shutdown with ASAN.

sithhell commented 5 years ago

@stevenrbrandt right, the only way to get rid of the segfault right now is to disable the leaksanitizer by setting the environment ASAN_OPTIONS=detect_leaks=0. The address sanitizer features like heap use after free or stack overflows are still enabled with that.

sithhell commented 5 years ago

@stevenrbrandt if you configure HPX with -DHPX_WITH_STACKOVERFLOW_DETECTION=Off, the leak sanitizer segfault goes away.

dheerajka29 commented 4 years ago

DHPX_WITH_STACKOVERFLOW_DETECTION

can you please share about, how can I config HPX on a docker