alexheretic / ab-av1

AV1 re-encoding using ffmpeg, svt-av1 & vmaf.
MIT License
416 stars 28 forks source link

CUDA acceleration for VMAF #163

Open TychoRasch opened 8 months ago

TychoRasch commented 8 months ago

With release v3.0.0 of VMAF, CUDA support has been added. https://github.com/Netflix/vmaf/releases/tag/v3.0.0

Now it is my understanding that this speeds up the VMAF calculation enormously, with users on reddit claiming a 10x speedup. Is it possible to add an option to ab-av1 that allows for CUDA acceleration on the VMAF calculation if the user has an NVidia GPU?

alexheretic commented 8 months ago

Seems possible. What we'd probably need is:

alexdns1 commented 7 months ago

I would like to see it implemented too but unfortunately i dont think the ffmpeg part is ready so maybe an option to run vmaf from its binary and not via ffmpeg ?

TychoRasch commented 7 months ago

@alexheretic I'll keep an eye out for the ffmpeg implementation and will update accordingly. I also have nvidia hardware to test and will look into cuda detection.

alexdns1 commented 7 months ago

@TychoRasch https://github.com/Netflix/vmaf/blob/master/Dockerfile.cuda#L39 looks like their docker is running a patched ffmpeg

alexdns1 commented 7 months ago

@TychoRasch correction looks like libvmaf_cuda is in latest ffmpeg

zachron commented 7 months ago

@alexheretic I know i was not part of the initial request, but i was looking for this and saw this in the FFMPEG docs, https://ffmpeg.org/ffmpeg-filters.html#libvmaf_005fcuda its the CLI example. i also have nvidia hardware and am willing to test it.

alexdns1 commented 7 months ago

@alexheretic I know i was not part of the initial request, but i was looking for this and saw this in the FFMPEG docs, https://ffmpeg.org/ffmpeg-filters.html#libvmaf_005fcuda its the CLI example. i also have nvidia hardware and am willing to test it.

Works for me

alexdns1 commented 7 months ago

`/opt/ffmpeg_vmaf/bin/ffmpeg -hwaccel cuda -hwaccel_output_format cuda -codec:v h264_cuvid -i test.mp4 -hwaccel cuda -hwaccel_output_format cuda -codec:v h264_cuvid -i test.mp4 -filter_complex " [0:v]scale_cuda=format=yuv420p[ref]; \ [1:v]scale_cuda=format=yuv420p[dis]; \ [dis][ref]libvmaf_cuda=log_fmt=json:log_path=output.json " -f null - ffmpeg version N-113111-g4fee63b241 Copyright (c) 2000-2023 the FFmpeg developers built with gcc 11 (GCC) configuration: --enable-nonfree --enable-ffnvcodec --enable-cuda-llvm --enable-cuda-nvcc --enable-libvmaf --enable-vapoursynth --enable-shared --prefix=/opt/ffmpeg_vmaf libavutil 58. 36.100 / 58. 36.100 libavcodec 60. 36.100 / 60. 36.100 libavformat 60. 20.100 / 60. 20.100 libavdevice 60. 4.100 / 60. 4.100 libavfilter 9. 14.101 / 9. 14.101 libswscale 7. 6.100 / 7. 6.100 libswresample 4. 13.100 / 4. 13.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 creation_time : 2021-07-24T18:17:54.000000Z Duration: 00:22:43.80, start: 0.000000, bitrate: 8266 kb/s Stream #0:00x1: Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x816, 7943 kb/s, 25 fps, 25 tbr, 25k tbn (default) Metadata: creation_time : 2021-07-24T18:17:54.000000Z handler_name : ?Mainconcept Video Media Handler vendor_id : [0][0][0][0] encoder : AVC Coding Stream #0:10x2: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default) Metadata: creation_time : 2021-07-24T18:17:54.000000Z handler_name : #Mainconcept MP4 Sound Media Handler vendor_id : [0][0][0][0] Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 creation_time : 2021-07-24T18:17:54.000000Z Duration: 00:22:43.80, start: 0.000000, bitrate: 8266 kb/s Stream #1:00x1: Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x816, 7943 kb/s, 25 fps, 25 tbr, 25k tbn (default) Metadata: creation_time : 2021-07-24T18:17:54.000000Z handler_name : ?Mainconcept Video Media Handler vendor_id : [0][0][0][0] encoder : AVC Coding Stream #1:10x2: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default) Metadata: creation_time : 2021-07-24T18:17:54.000000Z handler_name : #Mainconcept MP4 Sound Media Handler vendor_id : [0][0][0][0] Stream mapping: Stream #0:0 (h264_cuvid) -> scale_cuda:default (graph 0) Stream #1:0 (h264_cuvid) -> scale_cuda:default (graph 0) libvmaf_cuda:default (graph 0) -> Stream #0:0 (wrapped_avframe) Stream #0:1 -> #0:1 (aac (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, null, to 'pipe:': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: mp42mp41 encoder : Lavf60.20.100 Stream #0:0: Video: wrapped_avframe, cuda(tv, bt709, progressive), 1920x832 [SAR 1:1 DAR 30:13], q=2-31, 200 kb/s, 25 fps, 25 tbn Metadata: encoder : Lavc60.36.100 wrapped_avframe Stream #0:1(eng): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default) Metadata: creation_time : 2021-07-24T18:17:54.000000Z handler_name : #Mainconcept MP4 Sound Media Handler vendor_id : [0][0][0][0] encoder : Lavc60.36.100 pcm_s16le frame= 4421 fps=553 q=-0.0 size=N/A time=00:02:56.84 bitrate=N/A speed=22.1x

[q] command received. Exiting.

[Parsed_libvmaf_cuda_2 @ 0x7fb038005080] VMAF score: 99.265020 [out#0/null @ 0x2237ac0] video:2212kB audio:35328kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown frame= 4718 fps=548 q=-0.0 Lsize=N/A time=00:03:08.41 bitrate=N/A speed=21.9x

tail -n 20 output.json "max": 1.000000, "mean": 0.999991, "harmonic_mean": 0.999991 }, "integer_vif_scale3": { "min": 0.999983, "max": 1.000000, "mean": 0.999991, "harmonic_mean": 0.999991 }, "vmaf": { "min": 97.422102, "max": 100.000000, "mean": 99.265020, "harmonic_mean": 99.256884 } }, "aggregate_metrics": { } } `

alexheretic commented 7 months ago

I've added an experimental branch throwing in the example args for CUDA accelerated vmaf #178.

This should be easier to test now vmaf runs are simpler single process calls (since #177). Since I can't test myself please let me know how the args should be changed in the PR. E.g. how -c:v should be determined.

The PR can be installed locally with cargo install --git https://github.com/alexheretic/ab-av1 --branch cuda-vmaf

sven-pke commented 3 months ago

Here is what I get as output using the above PR with vmaf cuda ffmpeg build:

D:>ab-av1 vmaf --cuda --reference source.mov --distorted test.mkv ⠙ 00:00:00 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- (vmaf running, eta 0s) DEBUG: Using ffmpeg -filter_complex [0:v]scale_cuda=format=yuv444p10le,setpts=PTS-STARTPTS[dis];[1:v]scale_cuda=format=yuv444p10le,setpts=PTS-STARTPTS[ref];[dis][ref]libvmaf_cuda=:model=version=vmaf_4k_v0.6.1 Error: ffmpeg vmaf exit code -22 ---stderr--- ffmpeg version N-115146-ga71e46383d-gf8715d0300+3 Copyright (c) 2000-2024 the FFmpeg developers built with gcc 13.2.0 (Rev6, Built by MSYS2 project) configuration: --pkg-config=pkgconf --cc='ccache gcc' --cxx='ccache g++' --ld='ccache g++' --extra-cxxflags=-fpermissive --extra-cflags=-Wno-int-conversion --disable-autodetect --enable-amf --enable-bzlib --enable-cuda --enable-cuvid --enable-d3d11va --enable-dxva2 --enable-iconv --enable-lzma --enable-nvenc --enable-zlib --enable-sdl2 --enable-ffnvcodec --enable-nvdec --enable-cuda-llvm --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libdav1d --enable-libaom --disable-debug --enable-libfdk-aac --enable-fontconfig --enable-libass --enable-libbluray --enable-libfreetype --enable-libmfx --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libshine --enable-gpl --enable-avisynth --enable-libxvid --enable-libopenmpt --enable-version3 --enable-librav1e --enable-libsrt --enable-libgsm --enable-libvmaf --enable-libsvtav1 --enable-chromaprint --enable-decklink --enable-frei0r --enable-libaribb24 --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfribidi --enable-libgme --enable-libilbc --enable-libsvthevc --enable-libsvtvp9 --enable-libkvazaar --enable-libmodplug --enable-librist --enable-librtmp --enable-librubberband --enable-libtesseract --enable-libxavs --enable-libzmq --enable-libzvbi --enable-openal --enable-libcodec2 --enable-ladspa --enable-libglslang --enable-vulkan --enable-libdavs2 --enable-libxavs2 --enable-libuavs3d --enable-libplacebo --enable-libjxl --enable-opencl --enable-opengl --enable-libnpp --enable-libopenh264 --enable-openssl --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DCACA_STATIC --extra-cflags=-DMODPLUG_STATIC --extra-cflags=-DCHROMAPRINT_NODLL --extra-cflags=-DZMQ_STATIC --extra-libs=-lpsapi --extra-cflags=-DLIBXML_STATIC --extra-libs=-liconv --disable-w32threads --extra-cflags=-DKVZ_STATIC_LIB --enable-nonfree --extra-cflags='-IC:/PROGRA~1/NVIDIA~2/CUDA/v12.4/include' --extra-ldflags='-LC:/PROGRA~1/NVIDIA~2/CUDA/v12.4/lib/x64' --extra-cflags=-DAL_LIBTYPE_STATIC --extra-cflags='-ID:/media-autobuild_suite-master/local64/include' --extra-cflags='-ID:/media-autobuild_suite-master/local64/include/AL' libavutil 59. 17.100 / 59. 17.100 libavcodec 61. 5.103 / 61. 5.103 libavformat 61. 3.103 / 61. 3.103 libavdevice 61. 2.100 / 61. 2.100 libavfilter 10. 2.101 / 10. 2.101 libswscale 8. 2.100 / 8. 2.100 libswresample 5. 2.100 / 5. 2.100 libpostproc 58. 2.100 / 58. 2.100 [AVFilterGraph @ 000001e293171400] No option name near ':model=version=vmaf_4k_v0.6.1' [AVFilterGraph @ 000001e293171400] Error parsing a filter description around: [AVFilterGraph @ 000001e293171400] Error parsing filterchain '[dis][ref]libvmaf_cuda=:model=version=vmaf_4k_v0.6.1' around: Failed to set value '[0:v]scale_cuda=format=yuv444p10le,setpts=PTS-STARTPTS[dis];[1:v]scale_cuda=format=yuv444p10le,setpts=PTS-STARTPTS[ref];[dis][ref]libvmaf_cuda=:model=version=vmaf_4k_v0.6.1' for option 'filter_complex': Invalid argument Error parsing global options: Invalid argument

Could be wrong but the path should look like: libvmaf_cuda=model_path=vmaf_4k_v0.6.1

allrobot commented 1 month ago

In addition to N-cards, there are also A-cards and Intel integrated graphics. Might consider purchasing a second-hand graphics card, which would be cheaper.

Making full use of these hardware to accelerate VMAF calculations is a good thing.

baconsalad commented 4 days ago

Any intention of adding this into main? I see the cuda branch is quote dated now. There is already a way to encode with av1_nvenc (from #201) so this would be the final step for full hw with nvidia cards.