JuliaSIMD / LoopVectorization.jl

Macro(s) for vectorizing loops.
MIT License
742 stars 66 forks source link

LoopVectorization.jl causing segfaults on 1.11 #525

Closed maleadt closed 5 months ago

maleadt commented 8 months ago

LoopVectorization.jl's generated IR seems to cause segfaults on 1.11, as observed on PkgEval with at least 6 packages (MCPhylo,jl, LocalPoly.jl, VectorizedReduction.jl, NaNStatistics.jl, TimeSeriesClassification.jl, PlmDCA.jl). See this report for details: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/2cbecf4_vs_18b4f3f/report.html

@chriselrod I'm opening a new issue because https://github.com/JuliaSIMD/LoopVectorization.jl/issues/518 was closed, and to list all issues in case somebody wants to tackle this.


Some of the errors that I've encountered:

An LLVM assertion, as seen with MCPhylo.jl (requires assertions build of Julia):

julia: /workspace/srcdir/llvm-project/llvm/lib/IR/Instructions.cpp:2561: void llvm::InsertValueInst::init(llvm::Value*, llvm::Value*, llvm::ArrayRef<unsigned int>, const llvm::Twine&): Assertion `ExtractValueInst::getIndexedType(Agg->getType(), Idxs) == Val->getType() && "Inserted value must match indexed type!"' failed.

[177] signal 6 (-6): Aborted
in expression starting at /home/pkgeval/.julia/packages/MCPhylo/KWPlY/test/distributions/phylodist.jl:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fbeac0f040e)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZN4llvm15InsertValueInst4initEPNS_5ValueES2_NS_8ArrayRefIjEERKNS_5TwineE at /opt/julia/bin/../lib/julia/libLLVM-15jl.so (unknown line)
InsertValueInst at /source/usr/include/llvm/IR/Instructions.h:2640 [inlined]
Create at /source/usr/include/llvm/IR/Instructions.h:2565 [inlined]
CreateInsertValue at /source/usr/include/llvm/IR/IRBuilder.h:2343
emit_new_struct at /source/src/cgutils.cpp:3870
emit_new_struct at /source/src/julia.h:1704
emit_expr at /source/src/codegen.cpp:5945
emit_ssaval_assign at /source/src/codegen.cpp:5367
emit_stmtpos at /source/src/codegen.cpp:5642 [inlined]
emit_function at /source/src/codegen.cpp:8810
jl_emit_code at /source/src/codegen.cpp:9144
jl_emit_codeinst at /source/src/codegen.cpp:9227
_jl_compile_codeinst at /source/src/jitlayers.cpp:220
jl_generate_fptr_impl at /source/src/jitlayers.cpp:525
jl_compile_method_internal at /source/src/gf.c:2509 [inlined]
jl_compile_method_internal at /source/src/gf.c:2397
_jl_invoke at /source/src/gf.c:2912 [inlined]
ijl_apply_generic at /source/src/gf.c:3097
logpdf at /home/pkgeval/.julia/packages/MCPhylo/KWPlY/src/distributions/Phylodist.jl:118

A segfault during vload, as seen with NaNStatistics.jl and PlmDCA.jl:

[60] signal 11 (2): Segmentation fault
in expression starting at /home/pkgeval/.julia/packages/NaNStatistics/oBRaH/test/testArrayStats.jl:80
macro expansion at /home/pkgeval/.julia/packages/VectorizationBase/0dXyA/src/llvm_intrin/memory_addr.jl:987 [inlined]
__vload at /home/pkgeval/.julia/packages/VectorizationBase/0dXyA/src/llvm_intrin/memory_addr.jl:987 [inlined]
_vload at /home/pkgeval/.julia/packages/VectorizationBase/0dXyA/src/strided_pointers/stridedpointers.jl:95 [inlined]
macro expansion at /home/pkgeval/.julia/packages/VectorizationBase/0dXyA/src/vecunroll/memory.jl:60 [inlined]
_vload_unroll at /home/pkgeval/.julia/packages/VectorizationBase/0dXyA/src/vecunroll/memory.jl:535 [inlined]
_vload at /home/pkgeval/.julia/packages/VectorizationBase/0dXyA/src/vecunroll/memory.jl:771 [inlined]
macro expansion at /home/pkgeval/.julia/packages/LoopVectorization/7iB2K/src/reconstruct_loopset.jl:1107 [inlined]
_turbo_! at /home/pkgeval/.julia/packages/LoopVectorization/7iB2K/src/reconstruct_loopset.jl:1107 [inlined]
_nanmean at /home/pkgeval/.julia/packages/NaNStatistics/oBRaH/src/ArrayStats/ArrayStats.jl:344
__nanmean at /home/pkgeval/.julia/packages/NaNStatistics/oBRaH/src/ArrayStats/ArrayStats.jl:308 [inlined]
#nanmean#5 at /home/pkgeval/.julia/packages/NaNStatistics/oBRaH/src/ArrayStats/ArrayStats.jl:307 [inlined]
nanmean at /home/pkgeval/.julia/packages/NaNStatistics/oBRaH/src/ArrayStats/ArrayStats.jl:307

A segfault during vadd_fast as seen with VectorizedReductions.jl:

[12] signal 11 (2): Segmentation fault
in expression starting at /home/pkgeval/.julia/packages/VectorizedReduction/bsnWJ/test/reduce.jl:4
macro expansion at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/llvm_intrin/binary_ops.jl:31 [inlined]
vadd_fast at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/llvm_intrin/binary_ops.jl:31 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
fmap at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:11 [inlined]
vadd_fast at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/vecunroll/fmap.jl:111 [inlined]
add_fast at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/base_defs.jl:91 [inlined]
macro expansion at /home/pkgeval/.julia/packages/LoopVectorization/7iB2K/src/reconstruct_loopset.jl:1107 [inlined]
_turbo_! at /home/pkgeval/.julia/packages/LoopVectorization/7iB2K/src/reconstruct_loopset.jl:1107 [inlined]
macro expansion at /home/pkgeval/.julia/packages/VectorizedReduction/bsnWJ/src/vmapreduce.jl:236 [inlined]
vvmapreduce at /home/pkgeval/.julia/packages/VectorizedReduction/bsnWJ/src/vmapreduce.jl:231
vvreduce at /home/pkgeval/.julia/packages/VectorizedReduction/bsnWJ/src/vmapreduce.jl:147

The source of bad IR hasn't been fully determined yet, but it seems to be the Expr(:new) that's generated to pass structs by value instead of by reference: https://github.com/JuliaLang/julia/issues/52702#issuecomment-1874492883.


Deprecating LoopVectorization.jl isn't possible, because:

So the only solution forwards seems fixing LoopVectorization.jl. I've taken a first attempt at it in https://github.com/JuliaSIMD/LoopVectorization.jl/pull/523, but just removing the Expr(:new) optimization isn't sufficient, and there's other issues (see above).

chriselrod commented 8 months ago

A few people expressed disappointment at deprecating this library. If one of them can fix these issues, they're welcome to take over maintenance.

Otherwise, we should proceed on the deprecation plan. Note that both RecursiveFactorization and TriangularSolve.jl depend heavily on LV; @inbounds @fastmath performs many times worse to the point of making these libraries go from being faster than BLAS below matrices that are 200x200 or so, to many times slower. Thus they could be deprecated with it, or use explicit kernels.

Unfortunately, RFLU is still much faster than MKL below the 200x200 point on many people's computers https://github.com/SciML/LinearSolve.jl/issues/357 But we could replace the @turbo uses with manual kernels. There are only three things these libraries do with it:

  1. matrix multiply
  2. SIMD argmax
  3. faster tail handling for some simple loops

This stack is a major contributor to compile times, and does not precompile well: it is a frequent source of bugs where precompiled code allocates, but recompiling it in a running REPL with Revise (through making meaningless changes) make those allocations go away. At JuliaCon, I was told that Clima people actually sited LoopVectorization.jl as why they're not using the SciML ecosystem, and was asked if we could get rid of it. I'm not affiliated with them in any way that I know of, so I don't see any reason to appease them, just pointing out that there are different perspectives on these tradeoffs.

brenhinkeller commented 8 months ago

I'm slightly tempted to try to take over maintenance, but I fear that if the maintenance burden has become unsustainable for you it'll just be that much more unsustainable for me.

What was the Clima folks' apparent objection, out of curiosity?

chriselrod commented 8 months ago

I'm slightly tempted to try to take over maintenance, but I fear that if the maintenance burden has become unsustainable for you it'll just be that much more unsustainable for me.

What was the Clima folks' apparent objection, out of curiosity?

It really shouldn't take much effort. My suggestion, if the problem is limited to the Array case because of the Memory change, is to add special casing for Array where they get turned into PtrArrays. You'd then also need to make sure they get GC.@preserved.

schrimpf commented 5 months ago

I believe the segfaults in the tests from VectorizedReductions.jl and NaNStatistics.jl are caused by reducing over empty collections which leads to code along the lines of

A = Float64[]
out = zero(eltype(A))
@turbo for i in eachindex(A)
    out += A[i]
end

LoopVectorization warns users not to do this, so the problem lies with VectorizedReductions and NaNStatistics (here it's the weighted versions of each statistic that has problems, unweighted stats correctly add the check_empty=true flag).

I have not verified it, but I suspect PlmDCA has a similar problem. A quick search of the code reveals that it checks for empty collections in some places, but not others.

chriselrod commented 5 months ago

check_empty=true/false didn't change between early Julia versions and 1.10. We could make check_empty=true the default.

Closing this because tests pass on 1.11, while check_empty=false should've also caused segfaults in older Julia versions. https://github.com/JuliaSIMD/LoopVectorization.jl/actions/runs/8946228352