JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.2k stars 218 forks source link

Cannot select `__powidf2` while lowering `powi.f64` #76

Closed maleadt closed 3 years ago

maleadt commented 8 years ago

Repro:

@target ptx function foo(n)
    return 1.0^n
end
code_native(foo, Tuple{Int})

Generated IR:

define double @julia_foo_64487(i64) #0 !dbg !6 {
top:
  %sext = shl i64 %0, 32
  %1 = ashr exact i64 %sext, 32
  %2 = icmp eq i64 %1, %0
  br i1 %2, label %pass, label %fail

fail:                                             ; preds = %top
  call void @llvm.trap() JuliaGPU/CUDAnative.jl#2
  unreachable

pass:                                             ; preds = %top
  %3 = trunc i64 %0 to i32
  %4 = call double @llvm.powi.f64(double 1.000000e+00, i32 %3)
  ret double %4
}

versioninfo():

Julia Version 0.6.0-dev.884
Commit 34d11b0* (2016-09-14 21:55 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.0 (ORCJIT, ivybridge)
vchuravy commented 8 years ago

That seems to be because it is defined in base via Core.Intrinsics.powi_llvm and we would need to replace that with the version in libdevice.

This will be hard to solve until we can define ^(::Float64, ::Int64) in a target specific way so for now the correct solution is to use pow instead.

maleadt commented 8 years ago

No, Julia lowers it correctly to llvm.powi.f64, see the IR snippet. Sure, libdevice might provide a better-suited alternative, but the NVPTX back-end should either lower this intrinsic or die with a selection failure instead of segfaulting:

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7f13eb257f30)
llvm::NVPTXTargetLowering::getArgumentAlignment(llvm::SDValue, llvm::ImmutableCallSite const*, llvm::Type*, unsigned int) const at libLLVM-3.9.so (unknown line)
llvm::NVPTXTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const at libLLVM-3.9.so (unknown line)
llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const at libLLVM-3.9.so (unknown line)
unknown function (ip: 0x7f13eb595fed)
unknown function (ip: 0x7f13eb58b1c7)
unknown function (ip: 0x7f13eb575791)
llvm::SelectionDAG::Legalize() at libLLVM-3.9.so (unknown line)
llvm::SelectionDAGISel::CodeGenAndEmitDAG() at libLLVM-3.9.so (unknown line)
llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) at libLLVM-3.9.so (unknown line)
llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) at libLLVM-3.9.so (unknown line)
llvm::MachineFunctionPass::runOnFunction(llvm::Function&) at libLLVM-3.9.so (unknown line)
llvm::FPPassManager::runOnFunction(llvm::Function&) at libLLVM-3.9.so (unknown line)
llvm::FPPassManager::runOnModule(llvm::Module&) at libLLVM-3.9.so (unknown line)
llvm::legacy::PassManagerImpl::run(llvm::Module&) at libLLVM-3.9.so (unknown line)
getFunctionAddress at src/jitlayers.cpp:794
getAddressForFunction at src/codegen.cpp:1087
jl_dump_function_asm at src/codegen.cpp:1619
_dump_function at ./reflection.jl:520
_dump_function at ./reflection.jl:506
unknown function (ip: 0x7f11c9f52a57)
jl_apply_generic at src/gf.c:1854
code_native at ./reflection.jl:550
unknown function (ip: 0x7f11c9f5237d)
jl_apply_generic at src/gf.c:1854
code_native at ./reflection.jl:552
unknown function (ip: 0x7f11c9f52266)
jl_apply_generic at src/gf.c:1854
do_call at src/interpreter.c:71
eval at src/interpreter.c:210
jl_toplevel_eval_flex at src/toplevel.c:628
jl_toplevel_eval_in_warn at src/builtins.c:590
eval at ./boot.jl:238
unknown function (ip: 0x7f13e5a7a80f)
jl_apply_generic at src/gf.c:1854
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7f11c9f4f396)
jl_apply_generic at src/gf.c:1854
macro expansion at ./REPL.jl:97 [inlined]
JuliaGPU/CUDAnative.jl#3 at ./event.jl:68
unknown function (ip: 0x7f11c9f48f7f)
jl_apply_generic at src/gf.c:1854
jl_apply at src/julia.h:1379 [inlined]
start_task at src/task.c:259
unknown function (ip: 0xffffffffffffffff)
Allocations: 1540589 (Pool: 1539671; Big: 918); GC: 0
zsh: segmentation fault  julia
vchuravy commented 8 years ago

BT on a full debug build

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
getPointer at /home/valentin/julia/deps/srccache/llvm-3.9.0/include/llvm/ADT/PointerIntPair.h:56
getInstruction at /home/valentin/julia/deps/srccache/llvm-3.9.0/include/llvm/IR/CallSite.h:84
getCalledValue at /home/valentin/julia/deps/srccache/llvm-3.9.0/include/llvm/IR/CallSite.h:94
getCalledFunction at /home/valentin/julia/deps/srccache/llvm-3.9.0/include/llvm/IR/CallSite.h:102
getArgumentAlignment at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/Target/NVPTX/NVPTXISelLowering.cpp:1011
LowerCall at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/Target/NVPTX/NVPTXISelLowering.cpp:1612
LowerCallTo at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7686
ExpandLibCall at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1949
ExpandFPLibCall at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2038
ConvertNodeToLibcall at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:3861
LegalizeOp at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1127
Legalize at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:4394
CodeGenAndEmitDAG at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:823
SelectBasicBlock at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:679
SelectAllBasicBlocks at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1482
runOnMachineFunction at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:500
runOnMachineFunction at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:61
runOnFunction at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/CodeGen/MachineFunctionPass.cpp:60
runOnFunction at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/IR/LegacyPassManager.cpp:1526
runOnModule at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/IR/LegacyPassManager.cpp:1547
runOnModule at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/IR/LegacyPassManager.cpp:1603
run at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/IR/LegacyPassManager.cpp:1706
run at /home/valentin/julia/deps/srccache/llvm-3.9.0/lib/IR/LegacyPassManager.cpp:1737
getFunctionAddress at /home/valentin/julia/src/jitlayers.cpp:794
maleadt commented 8 years ago

My approach here would be to get a snippet of IR crashing llc outside of Julia, and optionally run bugpoint on it (but it is pretty minimal already of-course). Also, I haven't really looked at this at all, it might be a minor random issue. I was just upstreaming my bugs folder :smile: LLVM D9168 might apply.

vchuravy commented 8 years ago

Okay, I will give it a go :)

vchuravy commented 8 years ago

Fix submitted at https://reviews.llvm.org/D9168. LLVM is now attempting to lower this to compiler-rt intrinsics (and fails), and is no longer seqfaulting.

maleadt commented 7 years ago

I've added that patch to our fork.

The issue (segfault, or cannot select when having the patch) also occurs with more mundane trunc generated by convert, so this is bad. This patch implements some of them (the ones available as PTX instructions).

maleadt commented 7 years ago

Repro for the convert one:

julia> using CUDAnative
julia> foo(x) = convert(Int, x)
julia> CUDAnative.code_native(foo, (Float64,))
ERROR: LLVM error: Cannot select: t29: i64 = ExternalSymbol'trunc'

OP one:

julia> using CUDAnative
julia> bar(x) = 1.0^x
julia> CUDAnative.code_native(bar, (Int,))
ERROR: LLVM error: Cannot select: t9: i64 = ExternalSymbol'__powidf2'

Building from SVN now to see if lowering of the powi one has changed after D24300...

maleadt commented 7 years ago

On SVN:

julia> foo(x) = convert(Int, x)
foo (generic function with 1 method)

julia> CUDAnative.code_native(foo, (Float64,))
...
        // .globl       julia_convert
.visible .func  (.param .b64 func_retval0) julia_convert(
        .param .b64 julia_convert_param_0,
        .param .b64 julia_convert_param_1
)
{
        .reg .pred      %p<6>;
        .reg .b32       %r<2>;
        .reg .f64       %fd<3>;
        .reg .b64       %rd<2>;

        ld.param.f64    %fd1, [julia_convert_param];
        setp.ltu.f64    %p1, %fd1, 0dC3E0000000000000;
        setp.geu.f64    %p2, %fd1, 0d43E0000000000000;
        or.pred         %p3, %p1, %p2;
        cvt.rzi.f64.f64 %fd2, %fd1;     <-- the check for exactness
        setp.neu.f64    %p4, %fd2, %fd1;
        or.pred         %p5, %p3, %p4;
        @%p5 bra        LBB1_2;
        cvt.rzi.s64.f64 %rd1, %fd1;     <-- the actual trunc
        st.param.b64    [func_retval0+0], %rd1;
        ret;
LBB1_2:
        trap;   <<- InexactError
}
...

julia> bar(x) = 1.0^x
bar (generic function with 1 method)

julia> CUDAnative.code_native(bar, (Int,))
ERROR: LLVM error: Cannot select: t9: i64 = ExternalSymbol'__powidf2'
maleadt commented 5 years ago

https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic

Not all targets support all types however.

Julia should probably not use this intrinsic.