Closed giordano closed 1 year ago
We don't guarantee alignment bigger than 16
. It is weird that the crash is in the libc
though, if we had codegen'd a variant that blindly uses 512 bits vectors it might have required a stronger alignment.
Some more breadcrumbs while debugging with Valentin. This is where it segfaults:
>│0xffffac923fb8 <__memcpy_generic+232> ldp x12, x13, [x1]
│0xffffac923fbc <__memcpy_generic+236> sub x1, x1, x14
│0xffffac923fc0 <__memcpy_generic+240> add x2, x2, x14
│0xffffac923fc4 <__memcpy_generic+244> ldp x6, x7, [x1, #16]
│0xffffac923fc8 <__memcpy_generic+248> stp x12, x13, [x0]
│0xffffac923fcc <__memcpy_generic+252> ldp x8, x9, [x1, #32]
│0xffffac923fd0 <__memcpy_generic+256> ldp x10, x11, [x1, #48]
│0xffffac923fd4 <__memcpy_generic+260> ldp x12, x13, [x1, #64]!
│0xffffac923fd8 <__memcpy_generic+264> subs x2, x2, #0x90
│0xffffac923fdc <__memcpy_generic+268> b.ls 0xffffac924008 <__memcpy_generic+312> // b.plast
│0xffffac923fe0 <__memcpy_generic+272> stp x6, x7, [x3, #16]
│0xffffac923fe4 <__memcpy_generic+276> ldp x6, x7, [x1, #16]
│0xffffac923fe8 <__memcpy_generic+280> stp x8, x9, [x3, #32]
│0xffffac923fec <__memcpy_generic+284> ldp x8, x9, [x1, #32]
│0xffffac923ff0 <__memcpy_generic+288> stp x10, x11, [x3, #48]
│0xffffac923ff4 <__memcpy_generic+292> ldp x10, x11, [x1, #48]
│0xffffac923ff8 <__memcpy_generic+296> stp x12, x13, [x3, #64]!
│0xffffac923ffc <__memcpy_generic+300> ldp x12, x13, [x1, #64]!
│0xffffac924000 <__memcpy_generic+304> subs x2, x2, #0x40
│0xffffac924004 <__memcpy_generic+308> b.hi 0xffffac923fe0 <__memcpy_generic+272> // b.pmore
│0xffffac924008 <__memcpy_generic+312> ldp x1, x2, [x4, #-64]
│0xffffac92400c <__memcpy_generic+316> stp x6, x7, [x3, #16]
│0xffffac924010 <__memcpy_generic+320> ldp x6, x7, [x4, #-48]
│0xffffac924014 <__memcpy_generic+324> stp x8, x9, [x3, #32]
│0xffffac924018 <__memcpy_generic+328> ldp x8, x9, [x4, #-32]
│0xffffac92401c <__memcpy_generic+332> stp x10, x11, [x3, #48]
│0xffffac924020 <__memcpy_generic+336> ldp x10, x11, [x4, #-16]
│0xffffac924024 <__memcpy_generic+340> stp x12, x13, [x3, #64]
│0xffffac924028 <__memcpy_generic+344> stp x1, x2, [x5, #-64]
│0xffffac92402c <__memcpy_generic+348> stp x6, x7, [x5, #-48]
Output of
JULIA_LLVM_ARGS="-aarch64-sve-vector-bits-min=512 -print-after-all -print-module-scope" ./julia -e 'rand(1)'
Output of
JULIA_LLVM_ARGS="-aarch64-sve-vector-bits-min=512 -print-before-all -print-module-scope" ./julia -e 'rand(1)'
Compiling julia from source with the following Make.user
FORCE_ASSERTIONS=1
LLVM_ASSERTIONS=1
override JULIA_BUILD_MODE=debug
LLVM_DEBUG=2
USE_BINARYBUILDER_LLVM=0
I get
julia/deps/srccache/llvm-julia-13.0.1-0/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:5121: virtual llvm::SDValue llvm::AArch64TargetLowering::LowerFormalArguments(llvm::SDValue, llvm::CallingConv::ID, bool, const llvm::SmallVectorImpl<llvm::ISD::InputArg>&, const llvm::SDLoc&, llvm::SelectionDAG&, llvm::SmallVectorImpl<llvm::SDValue>&) const: Assertion `!Res && "Call operand has unhandled type"' failed.
Full output at https://gist.github.com/giordano/35d05c0d148fa1eae62e044ca6108eb9
There is some "progress": in julia nightly the segfault isn't in libc but in llvm:
julia> rand(1)
Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x0000ffffa8e41a94 in llvm::EVT::isExtended64BitVector() const () from /snx11273/home/ri-mgiordano/nightly/julia-9b83dd8920/bin/../lib/julia/libLLVM-14jl.so
The second frame of the stacktrace is in llvm::AArch64TargetLowering::LowerFormalArguments
, the same function which was hitting the assertion in the message above.
Edit: backtrace looks very similar to https://github.com/JuliaLang/julia/issues/43069#issuecomment-968116013.
This is now working:
$ JULIA_LLVM_ARGS="-aarch64-sve-vector-bits-min=512" julia -q
julia> rand(1)
1-element Vector{Float64}:
0.36115625072266977
julia> rand(1)
1-element Vector{Float64}:
0.7807049948989053
julia> versioninfo()
Julia Version 1.10.0-DEV.77
Commit 5da8d5f17ad (2022-11-30 11:11 UTC)
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: 48 × unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, a64fx)
Threads: 1 on 48 virtual cores
Environment:
JULIA_LLVM_ARGS = -aarch64-sve-vector-bits-min=512
Also, there is generally no need to use -aarch64-sve-vector-bits-min=512
anymore, autovectorisation works well out-of-the-box with LLVM 14 (Julia v1.9+)
The segfaults seems to be in code generation, because also
@code_llvm rand(1)
crashes.Note: this happens only in
master
, but not julia v1.7.2. Might be related to the upgrade of LLVM to v13.0.1