WolframRhodium / VapourSynth-BM3DCUDA

BM3D denoising filter for VapourSynth, implemented in CUDA, AVX2, HIP and SYCL
GNU General Public License v2.0
67 stars 6 forks source link

Bm3dSycl no devices found #24

Open Quackdoc opened 5 months ago

Quackdoc commented 5 months ago

EDIT: using bm3d = core.bm3dsycl.BM3Dv2(tools.depth(clip, 32), device_id=0, fast=True, radius=2) instead worked

I tried to compile and run the sycl stuff on Arch linux. and it seems to not be working. my intel gpu (Arch A380) will run at 100% compute workload for a couple seconds before spitting out this error.

Error: Failed to retrieve frame 0 with error: BM3D: Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND)
Output 12 frames in 14.05 seconds (0.85 fps)

The full pkgbuild and patch I used will be linked at the bottom. Im not sure if I compiled wrong or if it's a bug. or if this is a configuration issue with arch.

To compile I did apply this patch

diff --git a/sycl_source/CMakeLists.txt b/sycl_source/CMakeLists.txt
index 6a86aeb..5152460 100644
--- a/sycl_source/CMakeLists.txt
+++ b/sycl_source/CMakeLists.txt
@@ -6,6 +6,8 @@ endif()

 project(BM3DSYCL LANGUAGES CXX)

+find_package(IntelDPCPP REQUIRED)
+
 find_package(IntelSYCL REQUIRED CONFIG)

 add_library(bm3dsycl SHARED source.cpp kernel.cpp)

then compile with

  export PATH=/opt/intel/oneapi/compiler/2023.2.0/linux/bin:$PATH
  export IntelSYCL_DIR=/opt/intel/oneapi/compiler/2023.2.0/linux/IntelSYCL

  #clear cflags 
  unset CFLAGS 
  unset CXXFLAGS
  unset LDFLAGS
  unset LTOFLAGS
  unset MAKEFLAGS
  unset DEBUG_CFLAGS
  unset DEBUG_CXXFLAGS

  cmake -S "${_plug}" -B build \
    -DCMAKE_BUILD_TYPE=None \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DCMAKE_INSTALL_LIBDIR=lib/vapoursynth \
    -DCMAKE_SKIP_RPATH=ON \
    -DVAPOURSYNTH_INCLUDE_DIRECTORY="$(pkg-config --cflags vapoursynth | sed 's|-I||g')" \
    -DENABLE_HIP=Off \
    -DENABLE_CPU=Off \
    -DENABLE_SYCL=ON \
    -DENABLE_CUDA=OFF \
    -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

  cmake --build build

I then tried to use this

import vapoursynth as vs
import vstools as tools
core = vs.core
clip = core.lsmas.LWLibavSource(source="Pacific-rim.webm")

bm3d = core.bm3dsycl.BM3Dv2(tools.depth(clip, 32), None, .3, 4, 8, 1, device_id=1, fast=False)

bm3d.set_output(0)

and then finally run and get outout

vspipe -c y4m bm3d.vpy out.y4m
Error: Failed to retrieve frame 0 with error: BM3D: Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND)
Output 12 frames in 14.05 seconds (0.85 fps)

Misc: /opt/intel/oneapi/dev-utilities/2021.9.0/bin/oneapi-cli version v0.2.0-7-gab4eb7c822

PKGBUILD.txt sycl-patch.txt

WolframRhodium commented 5 months ago

What is the output of sycl-ls --verbose command?

Quackdoc commented 5 months ago
[opencl:gpu:0] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A380 Graphics 3.0 [23.35.27191]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]

Platforms: 2
Platform [#1]:
    Version  : OpenCL 3.0 
    Name     : Intel(R) OpenCL Graphics
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 3.0
        Name       : Intel(R) Arc(TM) A380 Graphics
        Vendor     : Intel(R) Corporation
        Driver     : 23.35.27191
Platform [#2]:
    Version  : 1.3
    Name     : Intel(R) Level-Zero
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 1.3
        Name       : Intel(R) Arc(TM) A380 Graphics
        Vendor     : Intel(R) Corporation
        Driver     : 1.3.27191
default_selector()      : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
accelerator_selector()  : No device of requested type available. Please chec...
cpu_selector()          : No device of requested type available. Please chec...
gpu_selector()          : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
custom_selector(gpu)    : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.27191]
custom_selector(cpu)    : No device of requested type available. Please chec...
custom_selector(acc)    : No device of requested type available. Please chec...
WolframRhodium commented 5 months ago

That's strange. What about uname -r?

Quackdoc commented 5 months ago

currently on 6.6.10 from zen, but I tested mainline, I could test linux-git too if necessary.

EDIT: Specifically it seems block_step=4 seems to be the issue Im not too sure, but could this simply be too intensive for the A380? Im not familiar with what this is, but it seems block_step=4 seems to really hammer the gpu before dying, Im wondering if the Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND) error is simply a not enough resource crash?

EDIT2: yeah Im getting a gpu hang issue,

Jan 14 20:01:05 quackdock kernel: i915 0000:0d:00.0: [drm] GPU HANG: ecode 12:10:85def5fa, in vspipe [422566]
Jan 14 20:01:05 quackdock kernel: i915 0000:0d:00.0: [drm] vspipe[422566] context reset due to GPU hang
Jan 14 20:01:17 quackdock kernel: Fence expiration time out i915-0000:0d:00.0:vspipe[422601]:4!
WolframRhodium commented 5 months ago

Thanks for the information. Could you try disabling hangcheck?

Quackdoc commented 5 months ago

Im getting the exact same thing even with it

su -c 'cat /sys/module/i915/parameters/enable_hangcheck'
N

sudo dmesg | tail -n 5
[sudo] password for quack: 
[   26.543795] Bluetooth: RFCOMM socket layer initialized
[   26.543805] Bluetooth: RFCOMM ver 1.11
[  100.442737] i915 0000:0d:00.0: [drm] GPU HANG: ecode 12:10:85def5fa, in vspipe [4183]
[  100.442748] i915 0000:0d:00.0: [drm] vspipe[4183] context reset due to GPU hang
[  111.905473] Fence expiration time out i915-0000:0d:00.0:vspipe[4197]:4!
WolframRhodium commented 5 months ago

What about tuning i915.request_timeout_ms?

The Allow Long-running GPU Kernels section on page 7 in this pdf may be relevant.

Quackdoc commented 5 months ago

setting i915.request_timeout_ms=900000 did let it run longer but it still crashed out in the end disabling preemption timeout and setting heartbeat interval caused my computer to hard lock, despite intel only driving a single display, and it not being the primary render device.

WolframRhodium commented 5 months ago

Thanks. In the end it may require manual tiling to work on these gpus at present.