crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.51k stars 1.62k forks source link

Stack overflow detection failure #7482

Open straight-shoota opened 5 years ago

straight-shoota commented 5 years ago

6928 added a cool feature which detects whether an invalid memory access is actually a stack overflow error.

However, this detection does not yet seem to work correctly in all cases.

On alpine edge the spec for SO detection on main stack fails:

  1) seg fault detects stack overflow on the main stack
     Failure/Error: error.should contain("Stack overflow")

       Expected:   "Invalid memory access (signal 11) at address 0xffffc59b7e30\n[0xaab6d0310cb8] *CallStack::print_backtrace:Int32 +100\n[0xaab6d03034d0] __crystal_sigfault_handler +208\n[0xfff785619964] __setjmp +56\n[0xaab6d03444c0] *Pointer(Hash::Entry(String, NamedTuple(time: Time, location: Time::Location)) | Nil) +136\n[0xaab6d0303530] *foo:NoReturn +48\n[0xaab6d0308fdc] *foo:NoReturn +23260 (1522 times)\n[0xaab6d02f9130] __crystal_main +2072\n[0xaab6d034690c] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +8\n[0xaab6d0346864] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +40\n[0xaab6d02ffedc] main +8\n[0xfff7855ee734] ???\n"
       to include: "Stack overflow"

     # spec/std/kernel_spec.cr:249

The spec for SO detection on fiber stack passes, though.

There was a similar failure reported for Ubuntu 14.04 which doesn't seem to have been fixed (https://github.com/crystal-lang/crystal/pull/6928#issuecomment-430530379).

If this can't be fixed right away, a short term play would be to simply disable the spec on affected platforms. They're not essential.

/cc @damaxwell

HertzDevil commented 2 months ago

Stack overflow detection relies on the current thread's stack bounds. On Linux this is obtained using LibC.pthread_getattr_np and LibC.pthread_attr_getstack:

https://github.com/crystal-lang/crystal/blob/d58ede5bacba23fbefed9040066aacec44ca953d/src/crystal/system/unix/pthread.cr#L118-L123

However, on musl-libc the internal stack size is always hardcoded to a very small default of 128 KiB, and pthread_getattr_np returns this value unmodified, presumably for the main thread as well. The result is that the main thread has an entirely incorrect @stack, as it respects ulimit -s, and that non-main threads are really that small, as we don't pass any attr to GC.pthread_create on thread creation. This is also mentioned on Ruby's bug tracker.

The stack trace itself is wrong as well, even though Exception::CallStack.print_backtrace works correctly as long as it is outside a signal handler. This appears to be an intentional won't-fix. If we need a meaningful stack trace inside a signal handler then we have to bypass _Unwind_Backtrace and mimic whatever a debugger like LLDB does.