Open Quackdoc opened 5 months ago
What is the output of sycl-ls --verbose
command?
[opencl:gpu:0] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A380 Graphics 3.0 [23.35.27191]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
Platforms: 2
Platform [#1]:
Version : OpenCL 3.0
Name : Intel(R) OpenCL Graphics
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 3.0
Name : Intel(R) Arc(TM) A380 Graphics
Vendor : Intel(R) Corporation
Driver : 23.35.27191
Platform [#2]:
Version : 1.3
Name : Intel(R) Level-Zero
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 1.3
Name : Intel(R) Arc(TM) A380 Graphics
Vendor : Intel(R) Corporation
Driver : 1.3.27191
default_selector() : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : No device of requested type available. Please chec...
gpu_selector() : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
custom_selector(gpu) : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
custom_selector(cpu) : No device of requested type available. Please chec...
custom_selector(acc) : No device of requested type available. Please chec...
That's strange. What about uname -r
?
currently on 6.6.10 from zen, but I tested mainline, I could test linux-git too if necessary.
EDIT: Specifically it seems block_step=4
seems to be the issue Im not too sure, but could this simply be too intensive for the A380? Im not familiar with what this is, but it seems block_step=4 seems to really hammer the gpu before dying, Im wondering if the Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND)
error is simply a not enough resource crash?
EDIT2: yeah Im getting a gpu hang issue,
Jan 14 20:01:05 quackdock kernel: i915 0000:0d:00.0: [drm] GPU HANG: ecode 12:10:85def5fa, in vspipe [422566]
Jan 14 20:01:05 quackdock kernel: i915 0000:0d:00.0: [drm] vspipe[422566] context reset due to GPU hang
Jan 14 20:01:17 quackdock kernel: Fence expiration time out i915-0000:0d:00.0:vspipe[422601]:4!
Thanks for the information. Could you try disabling hangcheck?
Im getting the exact same thing even with it
su -c 'cat /sys/module/i915/parameters/enable_hangcheck'
N
sudo dmesg | tail -n 5
[sudo] password for quack:
[ 26.543795] Bluetooth: RFCOMM socket layer initialized
[ 26.543805] Bluetooth: RFCOMM ver 1.11
[ 100.442737] i915 0000:0d:00.0: [drm] GPU HANG: ecode 12:10:85def5fa, in vspipe [4183]
[ 100.442748] i915 0000:0d:00.0: [drm] vspipe[4183] context reset due to GPU hang
[ 111.905473] Fence expiration time out i915-0000:0d:00.0:vspipe[4197]:4!
What about tuning i915.request_timeout_ms
?
The Allow Long-running GPU Kernels
section on page 7 in this pdf may be relevant.
setting i915.request_timeout_ms=900000
did let it run longer but it still crashed out in the end disabling preemption timeout and setting heartbeat interval caused my computer to hard lock, despite intel only driving a single display, and it not being the primary render device.
Thanks. In the end it may require manual tiling to work on these gpus at present.
EDIT: using
bm3d = core.bm3dsycl.BM3Dv2(tools.depth(clip, 32), device_id=0, fast=True, radius=2)
instead workedI tried to compile and run the sycl stuff on Arch linux. and it seems to not be working. my intel gpu (Arch A380) will run at 100% compute workload for a couple seconds before spitting out this error.
The full pkgbuild and patch I used will be linked at the bottom. Im not sure if I compiled wrong or if it's a bug. or if this is a configuration issue with arch.
To compile I did apply this patch
then compile with
I then tried to use this
and then finally run and get outout
Misc: /opt/intel/oneapi/dev-utilities/2021.9.0/bin/oneapi-cli version
v0.2.0-7-gab4eb7c822
PKGBUILD.txt sycl-patch.txt