ROCm / clr

MIT License
85 stars 35 forks source link

[Issue/clr]: clEnqueueReleaseD3D11ObjectsKHR returned CL_INVALID_GL_OBJECT #50

Closed nyanmisaka closed 1 day ago

nyanmisaka commented 5 months ago

Problem Description

Hello, our users encountered a DX11->OpenCL texture sharing issue after updating to Adrenalin 24.1.1 driver. After rolling the driver back to 23.12.1 everything went fine.

static void opencl_unmap_from_d3d11(AVHWFramesContext *dst_fc,
                                    HWMapDescriptor *hwmap)
{
    AVOpenCLFrameDescriptor    *desc = hwmap->priv;
    OpenCLDeviceContext *device_priv = dst_fc->device_ctx->internal->priv;
    OpenCLFramesContext *frames_priv = dst_fc->internal->priv;
    cl_event event;
    cl_int cle;

    cle = device_priv->clEnqueueReleaseD3D11ObjectsKHR(
        frames_priv->command_queue, desc->nb_planes, desc->planes,
        0, NULL, &event);
    if (cle != CL_SUCCESS) {
        av_log(dst_fc, AV_LOG_ERROR, "Failed to release texture "
              "handle: %d.\n", cle);
    }

    opencl_wait_events(dst_fc, &event, 1);
}

[AVHWFramesContext @ 00000153372f0200] Failed to release texture handle: -60.

The log shows that the clEnqueueReleaseD3D11ObjectsKHR() function returned an irrelevant return value: CL_INVALID_GL_OBJECT (-60).

It returns -60 which is ridiculous. Because (CL_INVALID_GL_OBJECT) is used exclusively in OpenGL/CL sharing, not DX11/CL sharing. According to the OpenCL documentation, this value is also not within the return value range of this function.

After digging deeper into AMD's OpenCL runtime (clr), I found that the return code -60 only used by OpenGL/CL interop does appear on the return path of this DX11/CL sharing function. And you guys refactored this part of the code not long ago.

https://github.com/ROCm/clr/blame/8ff39a54fc790454b95b325eb2d9cdfa06ba7968/opencl/amdocl/cl_gl.cpp#L1597 https://github.com/ROCm/clr/blame/8ff39a54fc790454b95b325eb2d9cdfa06ba7968/opencl/amdocl/cl_gl.cpp#L1583 https://github.com/ROCm/clr/blame/8ff39a54fc790454b95b325eb2d9cdfa06ba7968/opencl/amdocl/cl_gl.cpp#L1708 https://github.com/ROCm/clr/blame/8ff39a54fc790454b95b325eb2d9cdfa06ba7968/opencl/amdocl/cl_gl.cpp#L1693 https://github.com/ROCm/clr/blob/8ff39a54fc790454b95b325eb2d9cdfa06ba7968/opencl/amdocl/cl_d3d11.cpp#L388-L395

Operating System

10.0.19045 (Windows 10 22H2)

CPU

AMD Ryzen 9 5950X 16-Core Processor

GPU

AMD Radeon RX 7900 XTX

ROCm Version

ROCm 6.0.0

ROCm Component

clr

Steps to Reproduce

  1. Prepare a 1080p or 4k video. It can be any common video format such as H.264, HEVC or AV1.

  2. Download and unzip the jellyfin-ffmpeg6 6.0.1-1, which is the video transcoder of Jellyfin Media Server.

  3. Run the following command in CMD or PowerShell, this FFmpeg command uses DX11/CL sharing to interact directly with the D3D11VA decoder, OpenCL filter and AMF encoder to avoid extra copies.

    
    // Input file path is `C:\ANY_H264_HEVC_AV1_VIDEO.mp4`, you can change it
    // Output file path is `C:\output.mp4`, you can change it

ffmpeg.exe -init_hw_device d3d11va=dx11:,vendor=0x1002 -init_hw_device opencl=ocl@dx11 \ -filter_hw_device ocl -hwaccel d3d11va -hwaccel_output_format d3d11 -autorotate 0 -i C:\ANY_H264_HEVC_AV1_VIDEO.mp4 \ -autoscale 0 -an -sn -c:v h264_amf -quality speed -b:v 20M -maxrate 20M \ -vf "hwmap=derive_device=opencl,scale_opencl=w=1920:h=1080:format=nv12,hwmap=derive_device=d3d11va:reverse=1,format=d3d11" \ -vframes 5000 -y C:\output.mp4


4. It should fail immediately with error code -60 (CL_INVALID_GL_OBJECT).

Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_amf)) Stream #0:1 -> #0:1 (copy) Press [q] to stop, [?] for help [AVHWFramesContext @ 000001f01432de40] Failed to release texture handle: -60.



5. Downgrade the driver to the old version [Adrenalin 23.12.1](https://www.amd.com/en/support/kb/release-notes/rn-rad-win-23-12-1), re-do the above procedures, you can run the ffmpeg command without any issue.

### (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

_No response_

### Additional Information

Issue threads from users:
- https://forum.jellyfin.org/t-problem-on-transcoding-with-drivers-amd-24-1-1-windows-11
- https://forum.jellyfin.org/t-transcoding-not-working-i-have-rx5700-win10
nyanmisaka commented 5 months ago

Can you take a look at this DX11/CL sharing regression? @iassiour

iassiour commented 5 months ago

Thank you for reporting @nyanmisaka I can reproduce the issue with Adrenalin 24.1.1 on windows 11 and RX 6900 XT. To confirm there are two separate issues:

1) A regression using 24.1.1 driver: the ffmpeg app now fails with "Failed to release texture handle" message while it runs successfully using 23.12.1 2) When the function fails, the error code (-60) is invalid.

I will forward this to the team and someone will come back vey shortly.

iassiour commented 5 months ago

An update that we have identified the cause for the regression and are working on a fix.

nyanmisaka commented 5 months ago

@iassiour Thanks for your instant response!

If possible, please let me know when a new driver containing a fix is released. Or include it in the changelog.

Lyserberg commented 3 months ago

@iassiour Thanks for your instant response!

If possible, please let me know when a new driver containing a fix is released. Or include it in the changelog.

I'm on the latest stable revision of the drivers and this has been addressed.

nyanmisaka commented 3 months ago

@iassiour Thanks for your instant response!

If possible, please let me know when a new driver containing a fix is released. Or include it in the changelog.

I'm on the latest stable revision of the drivers and this has been addressed.

Thanks for your reply. Have you tested using the ffmpeg command from this ticket? Or is hardware transcoding back to work in Jellyfin? I haven't had time to update the driver yet.

matheusvhs commented 3 months ago

@iassiour Thanks for your instant response!

If possible, please let me know when a new driver containing a fix is released. Or include it in the changelog.

I'm on the latest stable revision of the drivers and this has been addressed.

Thanks for your reply. Have you tested using the ffmpeg command from this ticket? Or is hardware transcoding back to work in Jellyfin? I haven't had time to update the driver yet.

I just tested on an RX 7700 XT running the 24.3.1 drivers. And the -60 error still persists.

.\ffmpeg.exe  -init_hw_device d3d11va=dx11:,vendor=0x1002 -init_hw_device opencl=ocl@dx11 -filter_hw_device ocl -hwaccel d3d11va -hwaccel_output_format d3d11 -autorotate 0 -i C:\Users\mathe\Videos\test.mkv -autoscale 0 -an -sn -c:v h264_amf -quality speed -b:v 20M -maxrate 20M -vf "hwmap=derive_device=opencl,scale_opencl=w=1920:h=1080:format=nv12,hwmap=derive_device=d3d11va:reverse=1,format=d3d11" -vframes 5000 -y C:\Users\mathe\Videos\output.mp4
ffmpeg version 6.0.1-Jellyfin Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12-win32 (GCC)
  configuration: --prefix=/opt/ffmpeg --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --pkg-config-flags=--static --extra-version=Jellyfin --disable-ffplay --disable-debug --disable-doc --disable-sdl2 --disable-ptx-compression --disable-w32threads --enable-pthreads --enable-shared --enable-lto --enable-gpl --enable-version3 --enable-schannel --enable-iconv --enable-libxml2 --enable-zlib --enable-lzma --enable-gmp --enable-chromaprint --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libass --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libwebp --enable-libvpx --enable-libzimg --enable-libx264 --enable-libx265 --enable-libsvtav1 --enable-libdav1d --enable-libfdk-aac --enable-opencl --enable-dxva2 --enable-d3d11va --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[AVHWDeviceContext @ 000002a9ba43c700] Using device 1002:747e (AMD Radeon RX 7700 XT).
[libdav1d @ 000002a9ba453840] libdav1d 1.3.0-0-g4803559
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, matroska,webm, from 'C:\Users\mathe\Videos\test.mkv':
  Metadata:
    TIMECODE        : 01:00:00:00
    creation_time   : 2023-12-22T05:02:07.000000Z
    ENCODER         : Lavf58.45.100
  Duration: 00:00:58.54, start: 0.000000, bitrate: 12510 kb/s
  Stream #0:0: Video: av1 (Main), yuv420p(tv, bt709, progressive), 2560x1440, SAR 1:1 DAR 16:9, 59.94 fps, 59.94 tbr, 1k tbn (default)
    Metadata:
      ENCODER         : AV1 8-bit - AMD
      DURATION        : 00:00:58.541000000
  Stream #0:1: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default)
    Metadata:
      DURATION        : 00:00:58.542000000
Stream mapping:
  Stream #0:0 -> #0:0 (av1 (native) -> h264 (h264_amf))
Press [q] to stop, [?] for help
[AVHWFramesContext @ 000002a9ba4a2d40] Failed to release texture handle: -60.
nyanmisaka commented 3 months ago

Well, so it has not been fixed in 24.3.1 (RDNA driver). Note that Vega and Polaris using legacy drivers are not affected. Only the RDNA series is affected.

iassiour commented 3 months ago

For reference, two fixes have been made as part of this issue:

1) A fix for the regression in DX11->OpenCL texture sharing case introduced with Adrenalin 24.1.1 https://github.com/ROCm/clr/commit/dca7bb22b647bd2fb2f05a7a4d4fba5c26264737

2) A fix for the invalid error code in DX11->OpenCL interop https://github.com/ROCm/clr/commit/32d49d55ecd5150b8f4e4cd8bb5af5439d9aa7ff

These haven't made it into a release yet. It is not yet known on exactly which future release it will land but I can keep this issue open for reference and updates until that happens.

nyanmisaka commented 3 months ago

For reference, two fixes have been made as part of this issue:

  1. A fix for the regression in DX11->OpenCL texture sharing case introduced with Adrenalin 24.1.1 dca7bb2
  2. A fix for the invalid error code in DX11->OpenCL interop 32d49d5

These haven't made it into a release yet. It is not yet known on exactly which future release it will land but I can keep this issue open for reference and updates until that happens.

Good to know the fix has been merged. Since we can't compile these on Windows ourselves, users will need to wait for AMD to release new drivers.

Surasia commented 2 months ago

Also does not seem to be fixed on Adrenalin 24.4.1 (on RDNA1 for my case).

Edit: The openCL runtime was compiled on April 9th, but the master branch of CLR was used (v6.0.2 at the time) looking at IDA. The fixes were introduced in 6.1.0. Weirdly enough, HIP runtimes seem to be more up to date?

nyanmisaka commented 2 months ago

Also does not seem to be fixed on Adrenalin 24.4.1 (on RDNA1 for my case).

Edit: The openCL runtime was compiled on April 9th, but the master branch of CLR was used (v6.0.2 at the time) looking at IDA. The fixes were introduced in 6.1.0. Weirdly enough, HIP runtimes seem to be more up to date?

CLR 6.1.0 was just tagged last week (4/17), so it seems like it was perfectly missed by the 24.4.1 driver. If I were them, I would maintain at least 6.0 and 6.1 branches, and backport some hotfixes to 6.0 so that users can receive them as soon as possible.

nyanmisaka commented 2 months ago

Hi @iassiour, is it feasible to backport these two commits to the rocm-6.1.x branch? Otherwise, the fixes in develop branch will not be available until ROCm 6.2 is released, which usually take a quarter.

matheusvhs commented 1 month ago

Just installed 24.5.1 driver and hardware transcoding is working now using jellyfin-ffmpeg_6.0.1-6.

.\ffmpeg.exe  -init_hw_device d3d11va=dx11:,vendor=0x1002 -init_hw_device opencl=ocl@dx11 -filter_hw_device ocl -hwaccel d3d11va -hwaccel_output_format d3d11 -autorotate 0 -i C:\Users\mathe\Videos\test.mkv -autoscale 0 -an -sn -c:v h264_amf -quality speed -b:v 20M -maxrate 20M -vf "hwmap=derive_device=opencl,scale_opencl=w=1920:h=1080:format=nv12,hwmap=derive_device=d3d11va:reverse=1,format=d3d11" -vframes 5000 -y C:\Users\mathe\Videos\output.mp4
ffmpeg version 6.0.1-Jellyfin Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13-win32 (GCC)
  configuration: --prefix=/opt/ffmpeg --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --pkg-config-flags=--static --extra-version=Jellyfin --disable-ffplay --disable-debug --disable-doc --disable-sdl2 --disable-ptx-compression --disable-w32threads --enable-pthreads --enable-shared --enable-lto --enable-gpl --enable-version3 --enable-schannel --enable-iconv --enable-libxml2 --enable-zlib --enable-lzma --enable-gmp --enable-chromaprint --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libass --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libwebp --enable-libvpx --enable-libzimg --enable-libx264 --enable-libx265 --enable-libsvtav1 --enable-libdav1d --enable-libfdk-aac --enable-opencl --enable-dxva2 --enable-d3d11va --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[AVHWDeviceContext @ 0000023a7d279400] Using device 1002:747e (AMD Radeon RX 7700 XT).
[libdav1d @ 0000023a7d28e5c0] libdav1d 1.4.1-0-g872e470
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, matroska,webm, from 'C:\Users\mathe\Videos\test.mkv':
  Metadata:
    TIMECODE        : 01:00:00:00
    creation_time   : 2023-12-22T05:02:07.000000Z
    ENCODER         : Lavf58.45.100
  Duration: 00:00:58.54, start: 0.000000, bitrate: 12510 kb/s
  Stream #0:0: Video: av1 (Main), yuv420p(tv, bt709, progressive), 2560x1440, SAR 1:1 DAR 16:9, 59.94 fps, 59.94 tbr, 1k tbn (default)
    Metadata:
      ENCODER         : AV1 8-bit - AMD
      DURATION        : 00:00:58.541000000
  Stream #0:1: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default)
    Metadata:
      DURATION        : 00:00:58.542000000
Stream mapping:
  Stream #0:0 -> #0:0 (av1 (native) -> h264 (h264_amf))
Press [q] to stop, [?] for help
Output #0, mp4, to 'C:\Users\mathe\Videos\output.mp4':
  Metadata:
    TIMECODE        : 01:00:00:00
    encoder         : Lavf60.3.100
  Stream #0:0: Video: h264 (avc1 / 0x31637661), d3d11(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 20000 kb/s, 59.94 fps, 19001 tbn (default)
    Metadata:
      DURATION        : 00:00:58.541000000
      encoder         : Lavc60.3.100 h264_amf
frame= 3509 fps=325 q=-0.0 Lsize=   46677kB time=00:00:58.52 bitrate=6533.6kbits/s speed=5.43x
video:46661kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.033655%
nyanmisaka commented 1 month ago

Nice to hear that! Maybe wait a little longer to get more feedback. I'm just a little confused on how they do version control. The actual behavior is different from the code in the 6.1 branch on Github.

nyanmisaka commented 1 day ago

Closing because many people confirmed that this issue has been fixed in 24.5.1 and newer.