Open staticfloat opened 8 months ago
I was playing with installing musl builds using juliaup
last month and was able to successfully get it working on Alpine with some changes to download the correct musl binaries. Are these segfaults occurring in recent develop builds only? I see on https://julialang.org/downloads/ that the latest build is available for musl.
See also https://github.com/JuliaCI/julia-buildkite/issues/321. Culprit might be the small thread stack size on Alpine: https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/
Yep, that looks like the culprit, good find. They propose a solution here:
Adjusting the stack size at link time
In modern Alpine systems, since 2018, it is possible to set the default thread stack size at link time. This can be done with a special
LDFLAGS
flag, like-Wl,-z,stack-size=1024768
.
I'm not familiar with Buildkite, but I think we could add these flags here?
I think such flags should be added to the regular build system, as also users building julia
directly from git
on a muslc based system will want them, no?
There are already a bunch of checks to do special things on darwin (aka macos), FreeBSD etc. I don't see any special code blocks for muslc handling (though I may have missed them).
But I do note that in Make.inc
line 1390 there is code which seems to set a steck size for Windows via -Wl,--stack,8388608
, like this:
ifeq ($(OS), WINNT)
HAVE_SSP := 1
OSLIBS += -Wl,--export-all-symbols -Wl,--version-script=$(BUILDROOT)/src/julia.expmap \
$(NO_WHOLE_ARCHIVE) -lpsapi -lkernel32 -lws2_32 -liphlpapi -lwinmm -ldbghelp -luserenv -lsecur32 -latomic
JLDFLAGS += -Wl,--stack,8388608 # <---- NOTE THIS LINE
ifeq ($(ARCH),i686)
JLDFLAGS += -Wl,--large-address-aware
endif
JCPPFLAGS += -D_WIN32_WINNT=0x0502
UNTRUSTED_SYSTEM_LIBM := 1
# Use hard links for files on windows, rather than soft links
# https://stackoverflow.com/questions/3648819/how-to-make-a-symbolic-link-with-cygwin-in-windows-7
# Usage: $(WIN_MAKE_HARD_LINK) <source> <target>
WIN_MAKE_HARD_LINK := cp --dereference --link --force
else
WIN_MAKE_HARD_LINK := true -ignore
endif # $(OS) == WINNT
So it might make sense to add your code in the vicinity of this. The one thing I am not sure about is what the "correct" check for muslc at this point would be? I hope someone else will be able to help out (maybe @staticfloat or @gbaraldi -- or perhaps you can figure something out yourself.
musl and Alpine devs encourage fixing the non-portable code that results in stack exhaustion by moving the variable off the stack rather than adding detection, but I understand that could be tricky. There seems to be a method here to detect musl in both native and cross-compile environments: https://gist.github.com/unmanned-player/f2421eec512d610116f451249cce5920
It's worth taking the time to read through the StackOverflow issues linked in the comments of that gist. I'm unfamiliar with C and Makefiles so I probably can't figure this out, but maybe it gets you guys one step further?
Wouldn't the proper fix be to set the stack size explicitly at run time? Seems more general and portable than "moving the variable off the stack" (which is obviously not applicable in general) or setting a bigger default stack size at link time.
We should have Julia-level stack size knobs independent of the system or linker defaults. These settings should have an appropriate default, but, ideally, some command-line flags for controlling the stack size of each (type of) thread would be exposed.
For reference, POSIX/SUS allows controlling the stack size before a thread is created by using pthread_attr_setstacksize
and pthread_attr_setguardsize
.
For comparison, SBCL exposes the similar --control-stack-size
command-line option.
Related: #33480? As far as I understand from that issue, Julia sets the stack size of its threads to the maximum OS-allowed value (ulimit). Providing a command-line flag for setting thread size, instead of reading ulimit, would fix both of these issues, I guess?
On the other hand, if it's true that Julia always sets the stack size to the maximum allowed value, the only way to fix this issue is to increase the ulimit stack size limit for Julia on the Alpine system?
musl and Alpine devs encourage fixing the non-portable code that results in stack exhaustion by moving the variable off the stack rather than adding detection, but I understand that could be tricky. There seems to be a method here to detect musl in both native and cross-compile environments
Since that is not possible typically (these libraries are also not always posix-compliant when they don't feel like it--c.f. our reported bugs in their dlopen handling--and strict posix compliance often comes with its own bugs--due to problems with that standard usually listed in the BUGS section), so our general policy has been to refuse to support these libc until they add support to detect them reliability. Adding support for their quirks would be generally possible if they permitted reliably detecting which set of features and workarounds are supported by and/or required for those libc.
I think I had already tried some of the things here, see https://github.com/JuliaLang/julia/pull/52149. But what I didn't try was just using a newer musl.
It appears that the
x86_64-linux-musl
build is continually segfaulting during build due to LLVM running out of memory. Initial#ci-dev
investigations have not yet found a good reason for this.