JuliaGPU / oneAPI.jl

Julia support for the oneAPI programming toolkit.
https://juliagpu.org/oneapi/
Other
179 stars 21 forks source link

ERROR: UndefVarError: `OutOfGPUMemoryError` not defined #450

Closed BenjaminRemez closed 1 month ago

BenjaminRemez commented 1 month ago

The following throws an UndefVarError for OutOfGPUMemoryError. Note that after this, allocating any other oneArray throws again the same UndefVarError for OutOfGPUMemoryError, and subsequently many errors are thrown on running exit() in the REPL. Tested on WSL2.

This does not occur for larger sizes of the arrays, so I suppose this is about overflowing the GPU with too many submitted tasks before the first ones start to complete - this is fine, but I suppose a graceful failure that does not kill the GPU is the desired behavior. Likewise, without @btime the failure occurs only probabilistically.

using oneAPI, BenchmarkTools
A1 = oneArray(rand(Float32, 10^2)); A2 = similar(A1);

function F!(A2::T, A1::T) where T <: AbstractArray
           func(v) = 2f0 * sin(v) + 4.123f0 * cos(3f0 * v - 0.567f0)^2 #something nontrivial to occupy the GPU
           for _ in 1:10000
               map!(func, A2, A1)
           end
           oneAPI.synchronize()
           return A2
end

@btime F!(A2, A1)

 show(err)
1-element ExceptionStack:
UndefVarError: `OutOfGPUMemoryError` not defined
Stacktrace:
  [1] throw_api_error(res::oneAPI.oneL0._ze_result_t)
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/z4RC3/lib/level-zero/libze.jl:6
  [2] check
    @ ~/.julia/packages/oneAPI/z4RC3/lib/level-zero/libze.jl:19 [inlined]
  [3] zeCommandQueueExecuteCommandLists
    @ ~/.julia/packages/oneAPI/z4RC3/lib/utils/call.jl:24 [inlined]
  [4] execute!
    @ ~/.julia/packages/oneAPI/z4RC3/lib/level-zero/cmdlist.jl:50 [inlined]
  [5] #execute!#480
    @ ~/.julia/packages/oneAPI/z4RC3/lib/level-zero/cmdlist.jl:63 [inlined]
  [6] execute! (repeats 2 times)
    @ ~/.julia/packages/oneAPI/z4RC3/lib/level-zero/cmdlist.jl:61 [inlined]
  [7] #onecall#79
    @ ~/.julia/packages/oneAPI/z4RC3/src/compiler/execution.jl:228 [inlined]
  ....
BenjaminRemez commented 1 month ago

@maleadt On my machine, #451 did not resolve the issue. Currently on master, the problem seems worse than before --- if this occurs once, after quitting and restarting Julia, using oneAPI crashes Julia itself. This is tested on WSL, and the issue persists even if the virtual machine is shut down and restarted. The only remedy is restarting the physical Windows machine - so it appears this crash has an observable effect even outside WSL.