JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.83k stars 5.49k forks source link

Building Julia aborts when `JULIA_NUM_THREADS=4,1` is set #56533

Open oscardssmith opened 1 week ago

oscardssmith commented 1 week ago

This appears to have regressed on https://github.com/JuliaLang/julia/pull/56409. Specifically, wehn building Base, Julia aborts at

Compiling the compiler. This may take several minutes ...
Base.Compiler ──── 280.354 seconds
flparse.jl
    JULIA usr/lib/julia/sys.ji
Aborted (core dumped)
*** This error might be fixed by running `make clean`. If the error persists, try `make cleanall`. ***
make[1]: *** [sysimage.mk:71: /home/oscardssmith/julia/usr/lib/julia/sys.ji] Error 1
make: *** [Makefile:114: julia-sysimg-ji] Error 2

GDB ing the process shows

Thread 5 "julia" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe499d640 (LWP 292041)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737028675136) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737028675136) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737028675136) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737028675136, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7dbd476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7da37f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7144313 in jl_finish_task (ct=ct@entry=0x7fffee934010) at /home/oscardssmith/julia/src/task.c:345
#6  0x00007ffff7185829 in jl_threadfun (arg=0x5555556d5210) at /home/oscardssmith/julia/src/scheduler.c:122
#7  0x00007ffff7e0fac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#8  0x00007ffff7ea1850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
DilumAluthge commented 1 week ago

@oscardssmith @gbaraldi Should we add a Buildkite job that tests this, so we can make sure that this doesn't regress again in the future?

Alternatively, should we just have our build system override the value of JULIA_NUM_THREADS and set JULIA_NUM_THREADS=1 when building?

oscardssmith commented 1 week ago

IMO the medium term fix to this is that --threads=default should be our actual default which would mean that buildkite would detect these issues. Our build system is supposed to override the number of threads we launch with (which recently broke, hence the issue)

giordano commented 1 week ago

I don't see any segmentation fault.

oscardssmith commented 1 week ago

updated initial issue to be clearer.

KristofferC commented 1 week ago

Should we add a Buildkite job that tests this, so we can make sure that this doesn't regress again in the future?

Seems excessive to me when it is only a build issue.