JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.46k stars 5.46k forks source link

LLVM 3.9 miss-compiles inlined broadcast for Array & SubArray #19792

Closed Sacha0 closed 7 years ago

Sacha0 commented 7 years ago

On

julia> versioninfo()
Julia Version 0.6.0-dev.1807
Commit 26c8d85* (2016-12-31 04:12 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

The linalg/qr tests fail. The failure reduces to

n = 10;
srand(1234321);
a = randn(n,n)/2;

av = view(a, 1:n - 1, 1:n - 1);
qrav = qrfact(av);
q, r  = qrav[:Q], qrav[:R];
q*r; # fine
av; # fine
(q*r, av); # fine
q*r ≈ av # segfaults
# q*r = av # also segfaults
# q*r - av # also segfaults

e.g.

julia> n = 10;

julia> srand(1234321);

julia> a = randn(n,n)/2;

julia> av = view(a, 1:n - 1, 1:n - 1);

julia> qrav = qrfact(av);

julia> q, r  = qrav[:Q], qrav[:R];

julia> q*r; # fine

julia> av; # fine

julia> (q*r, av); # fine

julia> q*r ≈ av # segfaults

signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 0
_ZN12_GLOBAL__N_113InlineSpiller16postOptimizationEv at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
_ZN4llvm12RegAllocBase16postOptimizationEv at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
_ZN12_GLOBAL__N_18RAGreedy20runOnMachineFunctionERN4llvm15MachineFunctionE at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /Users/sacha/pkg/julia/usr/lib//libLLVM.dylib (unknown line)
operator() at /Users/sacha/pkg/julia/src/jitlayers.cpp:436 [inlined]
__invoke<(lambda at /Users/sacha/pkg/julia/src/jitlayers.cpp:434:13) &, llvm::Module &> at /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/__functional_base:416 [inlined]
__call<(lambda at /Users/sacha/pkg/julia/src/jitlayers.cpp:434:13) &, llvm::Module &> at /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/__functional_base:437 [inlined]
operator() at /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/functional:1437
operator() at /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/functional:1817 [inlined]
addModuleSet<llvm::SmallVector<std::__1::unique_ptr<llvm::Module, std::__1::default_delete<llvm::Module> >, 1>, llvm::RTDyldMemoryManager *, std::__1::unique_ptr<llvm::orc::LambdaResolver<(lambda at /Users/sacha/pkg/julia/src/jitlayers.cpp:530:23), (lambda at /Users/sacha/pkg/julia/src/jitlayers.cpp:554:23)>, std::__1::default_delete<llvm::orc::LambdaResolver<(lambda at /Users/sacha/pkg/julia/src/jitlayers.cpp:530:23), (lambda at /Users/sacha/pkg/julia/src/jitlayers.cpp:554:23)> > > > at /Users/sacha/pkg/julia/usr/include/llvm/ExecutionEngine/Orc/IRCompileLayer.h:73 [inlined]
addModule at /Users/sacha/pkg/julia/src/jitlayers.cpp:558
jl_add_to_ee at /Users/sacha/pkg/julia/src/jitlayers.cpp:782 [inlined]
jl_finalize_function at /Users/sacha/pkg/julia/src/jitlayers.cpp:793
getAddressForFunction at /Users/sacha/pkg/julia/src/codegen.cpp:1070 [inlined]
jl_generate_fptr at /Users/sacha/pkg/julia/src/codegen.cpp:1193
jl_call_method_internal at /Users/sacha/pkg/julia/src/./julia_internal.h:240
jl_apply_generic at /Users/sacha/pkg/julia/src/gf.c:2196
isapprox at ./linalg/generic.jl:1190
unknown function (ip: 0x3198d4456)
jl_apply_generic at /Users/sacha/pkg/julia/src/gf.c:2196
do_call at /Users/sacha/pkg/julia/src/interpreter.c:75
eval at /Users/sacha/pkg/julia/src/interpreter.c:214
jl_interpret_toplevel_expr at /Users/sacha/pkg/julia/src/interpreter.c:34
jl_toplevel_eval_flex at /Users/sacha/pkg/julia/src/toplevel.c:636
jl_toplevel_eval_in at /Users/sacha/pkg/julia/src/builtins.c:606
eval at ./boot.jl:236
jlcall_eval_18559 at /Users/sacha/pkg/julia/usr/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Users/sacha/pkg/julia/src/gf.c:2196
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x3198c9a76)
jl_apply_generic at /Users/sacha/pkg/julia/src/gf.c:2196
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:66
unknown function (ip: 0x3198c285f)
jl_apply_generic at /Users/sacha/pkg/julia/src/gf.c:2196
jl_apply at /Users/sacha/pkg/julia/src/./julia.h:1388 [inlined]
start_task at /Users/sacha/pkg/julia/src/task.c:261
Allocations: 7033163 (Pool: 7028643; Big: 4520); GC: 15
Segmentation fault: 11

Best!

tkelman commented 7 years ago

bisected to https://github.com/JuliaLang/julia/pull/19746 - @andreasnoack that assertion failure was real and should not have been merged

tkelman commented 7 years ago

Interestingly this doesn't fail with LLVM 3.7.1 (on Ubuntu 14.04 anyway). Julia codegen bug or upstream llvm bug? If you build with LLVM_ASSERTIONS = 1 you get more info

.../llvm-3.9.1/lib/CodeGen/LiveVariables.cpp:114

MBB != &MF->front() && "Can't find reaching def for virtreg"

cc @vchuravy

tkelman commented 7 years ago

There's also what seems to be a test-system bug here that JULIA_CPU_CORES=2 usr/bin/julia test/runtests.jl linalg/hessenberg linalg/qr says it passes but JULIA_CPU_CORES=1 usr/bin/julia test/runtests.jl linalg/hessenberg linalg/qr does not. Similar to #19376, and was seen with the broadcast test failure that #16961 introduced and #19745 fixed

vchuravy commented 7 years ago

I can reproduce this with -O1 on a build with LLVM_ASSERTIONS, but no with -O0. Maybe this is related to #15453?

vchuravy commented 7 years ago

The MachineInstuction the assertion is triggered from is:

(rr) p MI.dump()
  CMP64rr %vreg230, %vreg428, %EFLAGS<imp-def>; GR64:%vreg230,%vreg428 dbg:simdloop.jl:72 @[ broadcast.jl:150 @[ broadcast.jl:145 @[ broadcast.jl:263 @[ broadcast.jl:314 @[ broadcast.jl:454 @[ arraymath.jl:34 ] ] ] ] ] ]

and if I remove @simd from https://github.com/JuliaLang/julia/blob/26c8d856a2dd2f6a699f5bae46eee9be8c13a53d/base/broadcast.jl#L151-L159 the example works.

I further reduced this to:

Y = view(rand(10, 10), :, :);
X = rand(Float64, 10, 10);
broadcast(-, X, Y); # works
-(X, Y) # fails
vchuravy commented 7 years ago
Y = view(rand(10, 10), :, :);
X = rand(Float64, 10, 10);
myminus(A, B) = broadcast(-, A, B);
myminus(X, Y)
julia39-dbg: /home/wallnuss/src/julia/deps/srccache/llvm-3.9.1/lib/CodeGen/LiveVariables.cpp:114: void llvm::LiveVariables::MarkVirtRegAliveInBlock(llvm::LiveVariables::VarInfo&, llvm::MachineBasicBlock*, llvm::MachineBasicBlock*, std::vector<llvm::MachineBasicBlock*>&): Assertion `MBB != &MF->front() && "Can't find reaching def for virtreg"' failed.
vchuravy commented 7 years ago

I will be travelling for the first week of January, so I won't have too much time to look into this (sorry about that)

@vtjnash, @JeffBezanson If one of you has time to see if we can work around this from our codegen side, that would be great. I was more looking at this from the LLVM side, but I didn't find anything obvious.

tkelman commented 7 years ago

This seems buggy enough that we should at least consider dropping back to 3.7 if it can't be solved soon.

andreasnoack commented 7 years ago

This doesn't segfault on LLVM-svn c828f5b04a33480c4d8c86863a1439df265f2b2b.

tkelman commented 7 years ago

(reverse-)bisect on llvm to figure out what patch to carry?

andreasnoack commented 7 years ago

I'll try that

andreasnoack commented 7 years ago

A familiar name showed up (and bad means good)

99ca52276f9ee1386866d6dff6179cfa64824621 is the first bad commit
commit 99ca52276f9ee1386866d6dff6179cfa64824621
Author: Keno Fischer <kfischer@college.harvard.edu>
Date:   Mon Dec 5 21:25:03 2016 +0000

    [LAA] Prevent invalid IR for loop-invariant bound in loop body

    Summary:
    If LAA expands a bound that is loop invariant, but not hoisted out
    of the loop body, it used to use that value anyway, causing a
    non-domination error, because the memcheck block is of course not
    dominated by the scalar loop body. Detect this situation and expand
    the SCEV expression instead.

    Fixes PR31251

    Reviewers: anemet
    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: https://reviews.llvm.org/D27397

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288705 91177308-0d34-0410-b5e6-96231b3b80d8

:040000 040000 74687040ae081d9648605e476296951bbba888b5 89b8d818ed0748ef7474a01ce73d3e847df797ea M  lib
:040000 040000 884e86a25a25239e045977569ecbd328f0fc6119 c64349b6f7c34fd1afae99e5981f0297cc63770e M  test

it can be cherry-picked cleanly from release_39 and fixes the example in this issue.

tkelman commented 7 years ago

Very nice! Let's add that to our list then.

Keno commented 7 years ago

We should also pick up 83dc06334ff95ad18a951d0bb540290510f2f81a and whatever the git sha corresponding to https://reviews.llvm.org/rL290260 is.

tkelman commented 7 years ago

working on a branch (will also need to rebuild mac and windows binaries, if you've got any other changes in mind now's a good time)

Keno commented 7 years ago

I'm about to reapply a fixed version of https://reviews.llvm.org/D21731. We'll probably want to replace our existing patch by that once that's done.

tkelman commented 7 years ago

Could you also take a look at https://github.com/JuliaLang/julia/issues/19803 ? That only seems to be happening on 32 bit x86 and with LLVM assertions enabled. If it helps make reproducing faster I can put up a docker image that has deps prebuilt for it.

Keno commented 7 years ago

Yes, that would be helpful.

tkelman commented 7 years ago

We'll probably want to replace our existing patch by that once that's done.

https://github.com/JuliaLang/julia/pull/19810/commits/bf92182e7bdfbe94528990d0362d609313dbe9a7 if it sticks without getting reverted upstream

edit: needs some conflicts resolved to apply against 3.9.1, I'm inclined to leave the patch as it is for now if I don't hear otherwise?

peirongf commented 7 years ago

I am using llvm 4.0, to build Halide project. Halide is with tag release_2017_05_03. It failed with following error information:

bin/gaussian5x5_generator -o ./bin/arm-64-android -e o,h -f gaussian5x5_hvx128 target=arm-64-android-hvx_128 gaussian5x5_generator: /llvm-4.0.0.src/lib/CodeGen/LiveVariables.cpp:114: void llvm::LiveVariables::MarkVirtRegAliveInBlock(llvm::LiveVariables::VarInfo&, llvm::MachineBasicBlock, llvm::MachineBasicBlock, std::vectorllvm::MachineBasicBlock*&): Assertion `MBB != &MF->front() && "Can't find reaching def for virtreg"' failed. Makefile:91: recipe for target 'bin/arm-64-android/gaussian5x5_hvx128.o' failed make: *** [bin/arm-64-android/gaussian5x5_hvx128.o] Aborted (core dumped)

tkelman commented 7 years ago

Report that to Halide please, it's not a Julia issue.