JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.49k stars 5.46k forks source link

`--threads` option seems to now result in segfault in `jl_init_thread_heap` (`julia/src/gc.c:3992`) #54560

Open LebedevRI opened 4 months ago

LebedevRI commented 4 months ago

Building v1.11.0-beta1 (+ #54538) on debian sid, using clang-17:

$ cat /tmp/julia-multistage/build/stage-2/julia/Make.user 
LLVM_VERSION=17
USE_BINARYBUILDER_LLVM=1
EXTRA_CFLAGS=
OPENBLAS_TARGET_ARCH=ZEN
USE_BINARYBUILDER=1
CMAKE_GENERATOR=Ninja
override CC=clang-$(LLVM_VERSION)
override CXX=clang++-$(LLVM_VERSION)
MARCH=native
CFLAGS=-march=$(MARCH) -Wl,--undefined-version $(EXTRA_CFLAGS)
CXXFLAGS=$(CFLAGS)
JULIA_CPU_TARGET=$(MARCH)

but the result is

$ JULIA_NUM_THREADS=32 gdb /tmp/julia-multistage/build/stage-2/julia/usr/bin/julia
GNU gdb (Debian 13.2-1+b1) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
No symbol table is loaded.  Use the "file" command.
Breakpoint 1 (__ubsan_handle_load_invalid_value) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 2 (__asan_report_error) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 3 (__ubsan::ScopedReport::~ScopedReport) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 4 (__ubsan::Diag::~Diag) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 5 (__sanitizer_print_stack_trace) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 6 (^abort) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 7 (SignalHandler) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 8 (llvm::sys::PrintStackTrace) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 9 (SignalHandler) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 10 (^raise) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 11 (llvm::sys::RunSignalHandlers) pending.
No symbol table is loaded.  Use the "file" command.
Breakpoint 12 (^__assert_fail) pending.
Reading symbols from /tmp/julia-multistage/build/stage-2/julia/usr/bin/julia...
(gdb) r --cpu-target native --optimize=3 --min-optlevel=3 --inline=yes -g 0 --threads 1
Starting program: /tmp/julia-multistage/build/stage-2/julia/usr/bin/julia --cpu-target native --optimize=3 --min-optlevel=3 --inline=yes -g 0 --threads 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 424652]
[New Thread 0x7ffff00006c0 (LWP 424654)]
[New Thread 0x7fffddc006c0 (LWP 424655)]
[New Thread 0x7fffdd2006c0 (LWP 424656)]
[New Thread 0x7fffd48006c0 (LWP 424657)]
[New Thread 0x7fffcbe006c0 (LWP 424658)]
[New Thread 0x7fffc34006c0 (LWP 424659)]
[New Thread 0x7fffbaa006c0 (LWP 424660)]
[New Thread 0x7fffb20006c0 (LWP 424661)]
[New Thread 0x7fffa96006c0 (LWP 424662)]
[New Thread 0x7fffa0c006c0 (LWP 424663)]
[New Thread 0x7fff982006c0 (LWP 424664)]
[New Thread 0x7fff8f8006c0 (LWP 424665)]
[New Thread 0x7fff86e006c0 (LWP 424666)]
[New Thread 0x7fff7e4006c0 (LWP 424667)]
[New Thread 0x7fff75a006c0 (LWP 424668)]
[New Thread 0x7fff6d0006c0 (LWP 424669)]
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.0-beta1.1 (2024-05-22)
 _/ |\__'_|_|_|\__'_|  |  v1.11.0-beta1-fix/d2bf60efdf* (fork: 117 commits, 96 days)
|__/                   |

julia> exit()
[Thread 0x7fff6d0006c0 (LWP 424669) exited]
[Thread 0x7fff75a006c0 (LWP 424668) exited]
[Thread 0x7fff7e4006c0 (LWP 424667) exited]
[Thread 0x7fff86e006c0 (LWP 424666) exited]
[Thread 0x7fff8f8006c0 (LWP 424665) exited]
[Thread 0x7fff982006c0 (LWP 424664) exited]
[Thread 0x7fffa0c006c0 (LWP 424663) exited]
[Thread 0x7fffa96006c0 (LWP 424662) exited]
[Thread 0x7fffb20006c0 (LWP 424661) exited]
[Thread 0x7fffbaa006c0 (LWP 424660) exited]
[Thread 0x7fffc34006c0 (LWP 424659) exited]
[Thread 0x7fffcbe006c0 (LWP 424658) exited]
[Thread 0x7fffd48006c0 (LWP 424657) exited]
[Thread 0x7fffdd2006c0 (LWP 424656) exited]
[Thread 0x7fffddc006c0 (LWP 424655) exited]
[Thread 0x7ffff00006c0 (LWP 424654) exited]
[Inferior 1 (process 424649) exited normally]
(gdb) r --cpu-target native --optimize=3 --min-optlevel=3 --inline=yes -g 0 --threads 2
Starting program: /tmp/julia-multistage/build/stage-2/julia/usr/bin/julia --cpu-target native --optimize=3 --min-optlevel=3 --inline=yes -g 0 --threads 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 424671]
[New Thread 0x7ffff00006c0 (LWP 424673)]
[New Thread 0x7fffe16006c0 (LWP 424674)]

Thread 3 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe16006c0 (LWP 424674)]
0x00007ffff6fb95c3 in jl_init_thread_heap (ptls=ptls@entry=0x7fffdc000b70) at /repositories/julia/src/gc.c:3992
3992        memset(&ptls->gc_num, 0, sizeof(ptls->gc_num));
(gdb) bt full
#0  0x00007ffff6fb95c3 in jl_init_thread_heap (ptls=ptls@entry=0x7fffdc000b70) at /repositories/julia/src/gc.c:3992
        p = 0x7fffdc000e48
        heap = <optimized out>
        mq = 0x7fffdc0019b0
        cq = 0x7fffdc0019b0
        gc_cache = 0x7fffdc001c30
        wsa = <optimized out>
        q = 0x7fffdc001a70
        wsa2 = 0x7fffdc003e20
#1  0x00007ffff6fa2fb2 in jl_init_threadtls (tid=1) at /repositories/julia/src/threading.c:382
        ptls = 0x7fffdc000b70
        bt_data = <optimized out>
        allstates = <optimized out>
#2  0x00007ffff6fa4238 in jl_threadfun (arg=0x5555556ddd70) at /repositories/julia/src/scheduler.c:181
        stack_lo = 0x0
        stack_hi = 0x0
        targ = 0x5555556ddd70
        ptls = <optimized out>
        ct = <optimized out>
        wasrunning = <optimized out>
#3  0x00007ffff7e37dbb in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737352268544, 969745306413698405, -128, 0, 140737488342352, 140736966164480, -969689506164503195, -969727538008421019}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
              canceltype = 0}}}
        not_first_call = <optimized out>
#4  0x00007ffff7eb99f8 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) 
LebedevRI commented 4 months ago

I've just tried with v1.10.3, and it does not appear to happen there.

(I'm also may be seeing some weird behavior around precompiled package when passing --optimize=3, but i'm not sure.) Also, apparently build directory is (no longer?) relocatable?

JeffBezanson commented 3 months ago

This looks unusual; I can't reproduce it. Maybe a build inconsistency? Or maybe clang-specific? Does it work with gcc?

LebedevRI commented 3 months ago

Thanks for taking a look! I don't know what a "build inconsistency" means in this context, but i'm not sure that is it, i start from a clean build directory. Let me see if it works with gcc, but that would likely mean there's UB.

LebedevRI commented 3 months ago

@JeffBezanson yup :( Does not seem to immediately crash with gcc. This is rather quite worrying.