Patman86 / SVT-AV1-Mod-by-Patman

Patman's Mod of SVT-AV1 with some modifications
BSD 3-Clause Clear License
13 stars 0 forks source link

AVX512 disabled in Windows binary #6

Closed gitoss closed 5 months ago

gitoss commented 5 months ago

Forwarding the issue from Staxrip to the suitable repo: https://github.com/staxrip/staxrip/issues/1387

_Describe the bug SvtAv1EncApp.exe seems to be built only for AVX2, and doesn't enable AVX512.

Svt[info]: SVT [version]:       SVT-AV1-PSY Encoder Lib v2.1.0-1+1-8ba1c70f [Mod by Patman]
Svt[info]: [asm level on system : up to avx512]
Svt[info]: [asm level selected : up to avx2]

Expected behavior The 'asm level selected' should be avx512 if 'asm level on system" is avx512.

How to reproduce the issue Using a cpu supporting AVX512 (like AMD Zen4 or Intel Tiger Lake) and run SvtAv1EncApp.exe with either --asm avx512 or --asm max

Provide information The svt encoder has --asm max and should auto-select to best asm level for each system, so as far as I understand it it's not necessary to build a binary limited to AVX2?

Additional context Other SvtAv1EncApp Windows binaries available on github seem to be limited to AVX2, too - but have binaries supporting AVX512 for Linux: https://github.com/gianni-rosato/svt-av1-psy/releases_

gitoss commented 5 months ago

Out of interest, I ran media-autobuild_suite myself using -march=znver4 in custom_profile ... and it's the same result.

Svt[info]: SVT [version]: SVT-AV1-PSY Encoder Lib v2.1.0-A-1-g37a5609
Svt[info]: SVT [build] : Clang 18.1.6 64 bit
Svt[info]: LIB Build date: Jun 11 2024 00:17:04
Svt[info]: -------------------------------------------
Svt[info]: [asm level on system : up to avx512]
Svt[info]: [asm level selected : up to avx2]

I'm really not used to compile something for myself anymore, so I have no idea if enabling AVX512 would work with just CFLAGS, or it needs something in configure or cmake (which seems to have some logic to auto-detect for -DEN_AVX512_SUPPORT=1)

The ab-suite.cmake.log shows that at least CMAKE is detected, but I don't know if this is sufficient.

-- Checking C flag support for: [-mavx512f] - Yes
-- Checking C flag support for: [-mavx512bw] - Yes
-- Checking C flag support for: [-mavx512dq] - Yes
-- Checking C flag support for: [-mavx512vl] - Yes
-- Checking CXX flag support for: [-mavx512f] - Yes
-- Checking CXX flag support for: [-mavx512bw] - Yes
-- Checking CXX flag support for: [-mavx512dq] - Yes
-- Checking CXX flag support for: [-mavx512vl] - Yes

Sorry to be a bother, I know this is tricky if you cannot test it on a AVX512 capable cpu yourself. The original repo seems to have managed it for Linux: https://github.com/gianni-rosato/svt-av1-psy/releases/tag/v2.1.0-A

Patman86 commented 5 months ago

The avx512 support must be specified when compiling the binary. I will publish an update at the weekend.

-march=znver4 does not activate avx512 support. To do this, the command -DENABLE_AVX512=ON must be added to line 1542 of media-suite_compile.sh.

gitoss commented 5 months ago

The avx512 support must be specified when compiling the binary. I will publish an update at the weekend.

Thanks. It would be nice to know how you do it, i.e. what configure/cmake options to use. I've asked the original psy dev, too https://github.com/gianni-rosato/svt-av1-psy/discussions/57

gitoss commented 5 months ago

-march=znver4 does not activate avx512 support. To do this, the command -DENABLE_AVX512=ON must be added to line 1542 of media-suite_compile.sh.

Thanks, that did it.

Svt[info]: SVT [version]: SVT-AV1-PSY Encoder Lib v2.1.0-A-1-g37a5609
Svt[info]: SVT [build]  : Clang 18.1.6 64 bit
Svt[info]: LIB Build date: Jun 12 2024 00:08:50
Svt[info]: -------------------------------------------
Svt[info]: [asm level on system : up to avx512]
Svt[info]: [asm level selected : up to avx512]

Btw, this is different than libjxl which uses CFLAGS.

JPEG XL encoder v0.10.2 c158d65 [AVX3_DL]

gitoss commented 5 months ago

To do this, the command -DENABLE_AVX512=ON must be added to line 1542 of media-suite_compile.sh.

... or you could create a file like "svt-av1-psy-git_options" in the build directory, containing "-DENABLE_AVX512=ON" - which is the more official-ish way to add cflags as far as I understand it.

Patman86 commented 5 months ago

Not really. Included in the main folder of the repsitory is a CMakelists.txt where all options are defined and accessed by every build process! Here, the AVX512 option is already defined and the option is activated via CMake.

https://github.com/Patman86/SVT-AV1-Mod-by-Patman/blob/5f7e5f3666894b73bdce4ec28cbca53012415fff/CMakeLists.txt#L371

In addition, the CMake procedure is called in the media-suite_compile.sh. Cflags are not necessary there.

gitoss commented 5 months ago

In addition, the CMake procedure is called in the media-suite_compile.sh. Cflags are not necessary there.

Right, thanks - I didn't realize the _options.txt are only for configure CFLAGS, not for CMAKE -Dsomething

Patman86 commented 5 months ago

Updated my releases

gitoss commented 5 months ago

Updated my releases

Thanks. Can you tell what the actual difference is between the gcc & msvc builds, or does it depend on systems & circumstances?

Patman86 commented 5 months ago

The MSVC builds were compiled with Visual Studio and the GCC builds with MSYS (cross-compile). Different compilers

gitoss commented 5 months ago

The MSVC builds were compiled with Visual Studio and the GCC builds with MSYS (cross-compile). Different compilers

That I know :-) ... I was wondering if you figured out what the actual differene / effect is, like encoding speed. For example the vstudio binary from you is 1 fps faster than my own llwm mediasuite build (w/o lto), I didn't check your gcc yet and it wasn't a real benchmark anyway.

Anyway, thanks for the binaries, in the fullness of time I'll figure out which compiler (vstudio, gcc, llvm) works best for my zen4 system.

Patman86 commented 5 months ago

In most cases, the MSVC versions are faster than the cross-compiled versions, which may be due to compatibility with the Windows machine. This is similar with x265. For this reason I would also like to compile x264 with msvc and maybe I can do the same with rav1e.

gitoss commented 5 months ago

In most cases, the MSVC versions are faster than the cross-compiled versions, which may be due to compatibility with the Windows machine. This is similar with x265. For this reason I would also like to compile x264 with msvc and maybe I can do the same with rav1e.

For what its worth, you can cross-compile w/ lto (either gcc or clang), which results in a significant speedup. https://www.reddit.com/r/AV1/comments/jmwepw/how_to_build_libaomav1_to_be_as_fast_as_possible/

... instructions for llvm https://github.com/m-ab-s/media-autobuild_suite/issues/2669 https://clang.llvm.org/docs/ThinLTO.html

I guess vstudio will still be faster because Microsoft had a lot of time to optimize for the Windows platform. But because I haven't used Visual Studio for ages, I'm stuck with media autobuild suite for now to optimize for my specific -march (in my case, znver4).

Andarwinux commented 5 months ago

vstudio is faster simply because UCRT's libm is faster, and mingw-w64 overrided the UCRT libm implementation with their shit x87 fpu implementation. You might consider building svtav1 with clang-cl -Xclang -O3 -flto=thin, which would combine UCRT's high-performance libm with clang's advanced autovectorization, and also can override NT malloc.

gitoss commented 5 months ago

You might consider building svtav1 with clang-cl -Xclang -O3 -flto=thin, which would combine UCRT's high-performance libm with clang's advanced autovectorization, and also can override NT malloc.

Edit, again: Ok, I now understand your instructions are for using the Visual Studio front-end and the llvm compiler - and with mingw (media autobuild suite) this isn't possible.