JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.16k stars 206 forks source link

GPUCompiler emit_exception has wrong number o args #2423

Open wsmoses opened 2 weeks ago

wsmoses commented 2 weeks ago

Seen on https://github.com/JuliaGPU/CUDA.jl/pull/2422

I'm honesetly not sure if this is an error in GPUCompiler.jl/Enzyme.jl/CUDA.jl but defnitely requires a combination of them to err.


extensions/enzyme: Error During Test at /var/lib/buildkite-agent/builds/gpuci-5/julialang/cuda-dot-jl/test/extensions/enzyme.jl:42
--
  | Got exception outside of a @test
  | BoundsError: attempt to access 0-element Vector{LLVM.LLVMType} at index [1]
  | Stacktrace:
  | [1] getindex
  | @ ./essentials.jl:13 [inlined]
  | [2] call!(builder::LLVM.IRBuilder, rt::GPUCompiler.Runtime.RuntimeMethodInstance, args::Vector{LLVM.ConstantExpr})
  | @ GPUCompiler ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/rtlib.jl:39
  | [3] emit_exception!
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/irgen.jl:219
  | [4] emit_error
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/compiler.jl:1636
  | [5] #codegen#28538
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/compiler.jl:5877
  | [6] codegen
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/compiler.jl:5110 [inlined]
  | [7] #79
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/Enzyme.jl:761
  | [8] #JuliaContext#154
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/driver.jl:52
  | [9] JuliaContext
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/nWT2N/src/driver.jl:42 [inlined]
  | [10] tape_type
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/Enzyme.jl:760 [inlined]
  | [11] #augmented_primal#30
  | @ /var/lib/buildkite-agent/builds/gpuci-5/julialang/cuda-dot-jl/ext/EnzymeCoreExt.jl:224
  | [12] augmented_primal
  | @ /var/lib/buildkite-agent/builds/gpuci-5/julialang/cuda-dot-jl/ext/EnzymeCoreExt.jl:219 [inlined]

For some reason it tries to call a gpu error whose function has no arguments.

I added a print before the assertion to see what module is being printed: https://pastebin.com/raw/p9cP9PB4

The method was defined in cuda so I'm really not sure where things are going awry

From worker 2:  !362 = distinct !DISubprogram(name: "report_exception", linkageName: "julia_report_exception_9039", scope: null, file: !54, line: 143, type: !38, scopeLine: 143, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !10, retainedNodes: !39)
      From worker 2:    !54 = !DIFile(filename: "/home/wmoses/git/CUDA.jl/src/device/runtime.jl", directory: ".")

cc @vchuravy

wsmoses commented 1 week ago

@maleadt by chance do you have any ideas what's happening here?