JuliaGPU / oneAPI.jl

Julia support for the oneAPI programming toolkit.
https://juliagpu.org/oneapi/
Other
182 stars 22 forks source link

Fail to import #399

Closed csantosb closed 7 months ago

csantosb commented 7 months ago

When I try to using oneAPI (oneAPI v1.4.0) I get the following message.

┌ Error: Failed to initialize oneAPI
│   exception =
│    ZeError: driver is not initialized (code 2013265921, ZE_RESULT_ERROR_UNINITIALIZED)
│    Stacktrace:
│      [1] throw_api_error(res::oneAPI.oneL0._ze_result_t)
│        @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:8
│      [2] check
│        @ ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:19 [inlined]
│      [3] zeInit
│        @ ~/.julia/packages/oneAPI/2gxUb/lib/utils/call.jl:24 [inlined]
│      [4] __init__()
│        @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:100
│      [5] run_module_init(mod::Module, i::Int64)
│        @ Base ./loading.jl:1134
│      [6] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
│        @ Base ./loading.jl:1122
│      [7] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
│        @ Base ./loading.jl:1067
│      [8] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
│        @ Base ./loading.jl:1581
│      [9] _require(pkg::Base.PkgId, env::String)
│        @ Base ./loading.jl:1938
│     [10] __require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:1812
│     [11] #invoke_in_world#3
│        @ ./essentials.jl:926 [inlined]
│     [12] invoke_in_world
│        @ ./essentials.jl:923 [inlined]
│     [13] _require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:1803
│     [14] macro expansion
│        @ ./loading.jl:1790 [inlined]
│     [15] macro expansion
│        @ ./lock.jl:267 [inlined]
│     [16] __require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:1753
│     [17] #invoke_in_world#3
│        @ ./essentials.jl:926 [inlined]
│     [18] invoke_in_world
│        @ ./essentials.jl:923 [inlined]
│     [19] require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:1746
│     [20] eval
│        @ ./boot.jl:385 [inlined]
│     [21] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
│     [22] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
│     [23] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
│     [24] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
│     [25] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
│     [26] (::Base.var"#1013#1015"{Bool, Bool, Bool})(REPL::Module)
│        @ Base ./client.jl:432
│     [27] #invokelatest#2
│        @ ./essentials.jl:892 [inlined]
│     [28] invokelatest
│        @ ./essentials.jl:889 [inlined]
│     [29] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│        @ Base ./client.jl:416
│     [30] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:333
│     [31] _start()
│        @ Base ./client.jl:552
└ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:103

My versioninfo() is

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

and the output of my hwinfo --display gives

28: PCI 02.0: 0300 VGA compatible controller (VGA)              
  [Created at pci.386]
  Unique ID: _Znp.lIyCdeT3soB
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Model: "Intel TigerLake-LP GT2 [Iris Xe Graphics]"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x9a49 "TigerLake-LP GT2 [Iris Xe Graphics]"
  SubVendor: pci 0x1028 "Dell"
  SubDevice: pci 0x0a5c 
  Revision: 0x01
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0x6076000000-0x6076ffffff (rw,non-prefetchable)
  Memory Range: 0x4000000000-0x400fffffff (ro,non-prefetchable)
  I/O Ports: 0x3000-0x303f (rw)
  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)
  IRQ: 183 (499382 events)
  Module Alias: "pci:v00008086d00009A49sv00001028sd00000A5Cbc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Driver Info #1:
    Driver Status: xe is active
    Driver Activation Cmd: "modprobe xe"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

One more, inxi -Fzm gives me

Graphics:
  Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel
  Device-2: Microdia Integrated_Webcam_HD driver: uvcvideo type: USB
  Display: server: X.Org v: 21.1.11 driver: X: loaded: modesetting dri: iris
    gpu: i915 resolution: 1920x1080~60Hz
  API: EGL v: 1.5 drivers: iris,swrast platforms: x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.0.3-arch1.1
    renderer: Mesa Intel Xe Graphics (TGL GT2)
  API: Vulkan v: 1.3.279 drivers: intel,llvmpipe surfaces: xcb,xlib

Any idea ?

Thanks

maleadt commented 7 months ago

Make sure you have the necessary permissions to access the GPU hardware in /dev/dri*. You didn't mention which kernel version you are using, is it sufficiently recent (see the README, I think you need 6.2 at the least)? You can also try straceing the process to see if anything goes wrong.

Apart from those suggestions though, there's not much we can do, the API being as opaque as it is. If you don't figure it out, it may be best to open an issue on https://github.com/intel/compute-runtime.

csantosb commented 7 months ago

I'm using up to date archlinux, and kernel 6.8.1, with official julia binaries.

I'll try your suggestions, thanks. Not sure how to explain the issue upstream, though.

maleadt commented 7 months ago

After fixing /dev permissions, please post a strace. Maybe we can see what's up in there.

And maybe also a run with LD_DEBUG=libs.

csantosb commented 7 months ago

mer. 20 mars 2024 at 08:39, Tim Besard @.***> wrote:

After fixing /dev permissions, please post a strace. Maybe we can see what's up in there.

And maybe also a run with LD_DEBUG=libs.

Here we go:

https://git.sr.ht/~csantosb/traces/tree/69d47a88865754482c216b466a758343b216779b

contains output of

strace -o trace.txt /tmp/julia-1.10.2/bin/julia -e "using oneAPI"

and

export LD_DEBUG=libs; /tmp/julia-1.10.2/bin/julia -e "using oneAPI" 2> trace2.txt

Thanks for your help

maleadt commented 7 months ago

It looks like you have some Level Zero things installed globally:

     42930: find library=libze_tracing_layer.so.1 [0]; searching
     42930:  search cache=/etc/ld.so.cache
     42930:   trying file=/usr/lib/libze_tracing_layer.so.1
     42930: 
     42930: 
     42930: calling init: /usr/lib/libze_tracing_layer.so.1
openat(AT_FDCWD, "/usr/lib/libze_tracing_layer.so.1", O_RDONLY|O_CLOEXEC) = 20

On my system, it's after that (normally failed) discovery that /dev/dri is scanned:

openat(AT_FDCWD, "/usr/lib/libze_tracing_layer.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
munmap(0x7ff195395000, 34618)           = 0
openat(AT_FDCWD, "/dev/dri/by-path", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 17

That doesn't happen in your strace, so I'd guess that the mixing of our libze with your system libze_validation makes the whole thing bail out early.

Could you try removing those system libraries, if only temporarily? Generally, the LD_DEBUG=libs output shouldn't be loading any system libraries (except for core ones like libc, libm, libpthread, etc).

csantosb commented 7 months ago

That doesn't happen in your strace, so I'd guess that the mixing of our libze with your system libze_validation makes the whole thing bail out early.

Could you try removing those system libraries, if only temporarily? Generally, the LD_DEBUG=libs output shouldn't be loading any system libraries (except for core ones like libc, libm, libpthread, etc).

Good point, thanks !

Now, don’t have any package related to oneapi in my system

sudo pacman -Fy libze_intel_vpu.so.1 sudo pacman -Fy libze_tracinglayer.so.1 sudo pacman -Fx libze

:: Synchronizing package databases... core is up to date extra is up to date :: Synchronizing package databases... core is up to date extra is up to date extra/level-zero-loader 1.15.1-1 usr/lib/libze_tracing_layer.so.1 extra/intel-compute-runtime 23.48.27912.11-1 usr/lib/libze_intel_gpu.so usr/lib/libze_intel_gpu.so.1 usr/lib/libze_intel_gpu.so.1.3.27912 extra/intel-oneapi-basekit 2024.0.0.49564-2 opt/intel/oneapi/2024.0/lib/libze_trace_collector.so opt/intel/oneapi/compiler/2024.0/lib/libze_trace_collector.so extra/level-zero-headers 1.15.1-1 usr/lib/pkgconfig/libze_loader.pc extra/level-zero-loader 1.15.1-1 usr/lib/libze_loader.so usr/lib/libze_loader.so.1 usr/lib/libze_loader.so.1.15.1 usr/lib/libze_tracing_layer.so usr/lib/libze_tracing_layer.so.1 usr/lib/libze_tracing_layer.so.1.15.1 usr/lib/libze_validation_layer.so usr/lib/libze_validation_layer.so.1 usr/lib/libze_validation_layer.so.1.15.1

sudo updatedb locate libze_intel_vpu.so.1 locate libze_tracing_layer.so.1

/home/csantos/.julia/artifacts/521996985d539cc752bbc959f2fd92df020356dc/lib/libze_tracing_layer.so.1 /home/csantos/.julia/artifacts/521996985d539cc752bbc959f2fd92df020356dc/lib/libze_tracing_layer.so.1.16.1

So my system is clean, and this is what I obtain, which is closer to what you get

https://git.sr.ht/~csantosb/traces/tree/6d99b628d22feb97557dff084bf0bcd16ce914cc

maleadt commented 7 months ago

OK great, despite the error being the same we do actually see libze scanning /dev now, indicating that the tracing layer mismatch was problematic in the first place.

openat(AT_FDCWD, "/dev/dri/by-path", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 20
fstat(20, {st_mode=S_IFDIR|0755, st_size=80, ...}) = 0
getdents64(20, 0x365f5f0 /* 4 entries */, 32768) = 144
getdents64(20, 0x365f5f0 /* 0 entries */, 32768) = 0
close(20)                               = 0
openat(AT_FDCWD, "/dev/dri/by-path/pci-0000:00:02.0-render", O_RDWR) = 20
ioctl(20, DRM_IOCTL_VERSION, 0x7fff8e242e80) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e242ff0) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e242ff0) = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 21
fstat(21, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(21, 0x365f5f0 /* 5 entries */, 32768) = 144
getdents64(21, 0x365f5f0 /* 0 entries */, 32768) = 0
close(21)                               = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm/card1/prelim_uapi_version", O_RDONLY) = -1 ENOENT (Aucun fichier ou dossier de ce type)
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242eb0) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242eb0) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f80) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e243020) = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, 0x7fff8e243060) = -1 EINVAL (Argument invalide)
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f90) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f90) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f30) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f30) = 0
futex(0x366a858, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(20, DRM_IOCTL_I915_GEM_VM_CREATE, 0x7fff8e242ff0) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242710) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242710) = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, 0x7fff8e242840) = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 21
fstat(21, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(21, 0x365f5f0 /* 5 entries */, 32768) = 144
getdents64(21, 0x365f5f0 /* 0 entries */, 32768) = 0
close(21)                               = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm/card1/gt_max_freq_mhz", O_RDONLY) = 21
read(21, "1300\n", 8191)                = 5
close(21)                               = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, 0x7fff8e242830) = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, 0x7fff8e242850) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e2427e0) = 0
readlink("/proc/self/exe", "/tmp/julia-1.10.2/bin/julia", 511) = 27
futex(0x36a3840, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(20, DRM_IOCTL_I915_GEM_VM_DESTROY, 0x7fff8e2431e0) = 0
close(20)                               = 0

I don't see anything stand out here. There's an DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM returning EINVAL, but more queries are made after that, so it doesn't seem fatal.

Maybe also try running with ZE_ENABLE_LOADER_DEBUG_TRACE= , according to https://github.com/oneapi-src/level-zero?tab=readme-ov-file#debug-trace.

❯ ZE_ENABLE_LOADER_DEBUG_TRACE=1 jl --project examples/vadd.jl
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Load Library of libze_tracing_layer.so.1 failed with libze_tracing_layer.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED))
ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS
csantosb commented 7 months ago

Maybe also try running with ZE_ENABLE_LOADER_DEBUG_TRACE= , according to https://github.com/oneapi-src/level-zero?tab=readme-ov-file#debug-trace.

❯ ZE_ENABLE_LOADER_DEBUG_TRACE=1 jl --project examples/vadd.jl ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:Load Library of libze_tracing_layer.so.1 failed with libze_tracing_layer.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS

Here is my output:

ZE_ENABLE_LOADER_DEBUG_TRACE=1 /tmp/julia-1.10.2/bin/julia -e "using oneAPI" ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:Load Library of libze_tracing_layer.so.1 failed with libze_tracing_layer.so.1: cannot open shared object file: No such file or directo ry ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_ERROR_UNINITIALIZED ZE_LOADER_DEBUG_TRACE:Check Drivers Failed on libze_intel_gpu.so.1 , driver will be removed. zeInit failed with ZE_RESULT_ERROR_UNINITIALIZED ┌ Error: Failed to initialize oneAPI │ exception = │ ZeError: driver is not initialized (code 2013265921, ZE_RESULT_ERROR_UNINITIALIZED) │ Stacktrace: │ [1] throw_api_error(res::oneAPI.oneL0._ze_result_t) │ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:8 │ [2] check │ @ ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:19 [inlined] │ [3] zeInit │ @ ~/.julia/packages/oneAPI/2gxUb/lib/utils/call.jl:24 [inlined] │ [4] init() │ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:100 │ [5] run_module_init(mod::Module, i::Int64) │ @ Base ./loading.jl:1134 │ [6] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String) │ @ Base ./loading.jl:1122 │ [7] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any}) │ @ Base ./loading.jl:1067 │ [8] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128) │ @ Base ./loading.jl:1581 │ [9] _require(pkg::Base.PkgId, env::String) │ @ Base ./loading.jl:1938 │ [10] __require_prelocked(uuidkey::Base.PkgId, env::String) │ @ Base ./loading.jl:1812 │ [11] #invoke_in_world#3 │ @ ./essentials.jl:926 [inlined] │ [12] invoke_in_world │ @ ./essentials.jl:923 [inlined] │ [13] _require_prelocked(uuidkey::Base.PkgId, env::String) │ @ Base ./loading.jl:1803 │ [14] macro expansion │ @ ./loading.jl:1790 [inlined] │ [15] macro expansion │ @ ./lock.jl:267 [inlined] │ [16] __require(into::Module, mod::Symbol) │ @ Base ./loading.jl:1753 │ [17] #invoke_in_world#3 │ @ ./essentials.jl:926 [inlined] │ [18] invoke_in_world │ @ ./essentials.jl:923 [inlined] │ [19] require(into::Module, mod::Symbol) │ @ Base ./loading.jl:1746 │ [20] eval │ @ ./boot.jl:385 [inlined] │ [21] exec_options(opts::Base.JLOptions) │ @ Base ./client.jl:291 │ [22] _start() │ @ Base ./client.jl:552 └ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:103

Maybe the two lines:

ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_ERROR_UNINITIALIZED ZE_LOADER_DEBUG_TRACE:Check Drivers Failed on libze_intel_gpu.so.1 , driver will be removed. zeInit failed with ZE_RESULT_ERROR_UNINITIALIZED

provide hint about where to look at for a possible solution.

maleadt commented 7 months ago

Maybe the two lines: ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_ERROR_UNINITIALIZED ZE_LOADER_DEBUG_TRACE:Check Drivers Failed on libze_intel_gpu.so.1 , driver will be removed. zeInit failed with ZE_RESULT_ERROR_UNINITIALIZED

It does look like the issue is with the compute-runtime, providing libze_intel_gpu. Could you try running with NEOReadDebugKeys=1 PrintDebugMessages=1 PrintXeLogs=1? I'm not too familiar with compute-runtime's inner workings though; maybe @kballeda could suggest what else to try here. If not, I think we'll have to consider filing an issue upstream.

csantosb commented 7 months ago

cg

It does look like the issue is with the compute-runtime, providing libze_intel_gpu. Could you try running with NEOReadDebugKeys=1 PrintDebugMessages=1 PrintXeLogs=1?

export NEOReadDebugKeys=1; export PrintDebugMessages=1; export PrintXeLogs=1; export ZE_ENABLE_LOADER_DEBUG_TRACE=1; /tmp/julia-1.10.2/bin/julia -e "using oneAPI"

gives

ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:Load Library of libze_tracing_layer.so.1 failed with libze_tracing_layer.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) INFO: System Info query failed! WARNING: Failed to request OCL Turbo Boost ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_ERROR_UNINITIALIZED ZE_LOADER_DEBUG_TRACE:Check Drivers Failed on libze_intel_gpu.so.1 , driver will be removed. zeInit failed with ZE_RESULT_ERROR_UNINITIALIZED ┌ Error: Failed to initialize oneAPI │ exception = │ ZeError: driver is not initialized (code 2013265921, ZE_RESULT_ERROR_UNINITIALIZED) │ Stacktrace: │ [1] throw_api_error(res::oneAPI.oneL0._ze_result_t) │ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:8 │ [2] check │ @ ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:19 [inlined] │ [3] zeInit │ @ ~/.julia/packages/oneAPI/2gxUb/lib/utils/call.jl:24 [inlined] │ [4] init() │ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:100 │ [5] run_module_init(mod::Module, i::Int64) │ @ Base ./loading.jl:1134 │ [6] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String) │ @ Base ./loading.jl:1122 │ [7] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any}) │ @ Base ./loading.jl:1067 │ [8] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128) │ @ Base ./loading.jl:1581 │ [9] _require(pkg::Base.PkgId, env::String) │ @ Base ./loading.jl:1938 │ [10] __require_prelocked(uuidkey::Base.PkgId, env::String) │ @ Base ./loading.jl:1812 │ [11] #invoke_in_world#3 │ @ ./essentials.jl:926 [inlined] │ [12] invoke_in_world │ @ ./essentials.jl:923 [inlined] │ [13] _require_prelocked(uuidkey::Base.PkgId, env::String) │ @ Base ./loading.jl:1803 │ [14] macro expansion │ @ ./loading.jl:1790 [inlined] │ [15] macro expansion │ @ ./lock.jl:267 [inlined] │ [16] __require(into::Module, mod::Symbol) │ @ Base ./loading.jl:1753 │ [17] #invoke_in_world#3 │ @ ./essentials.jl:926 [inlined] │ [18] invoke_in_world │ @ ./essentials.jl:923 [inlined] │ [19] require(into::Module, mod::Symbol) │ @ Base ./loading.jl:1746 │ [20] eval │ @ ./boot.jl:385 [inlined] │ [21] exec_options(opts::Base.JLOptions) │ @ Base ./client.jl:291 │ [22] _start() │ @ Base ./client.jl:552 └ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:103

csantosb commented 7 months ago

Problem fixed for me after a system update.

Thanks a lot for your help !