Open Cygon opened 5 months ago
I did some further testing:
cpu-used
all the way up to 8
and using --lag-in-frames=48
has so far not made aom-av1-lavish move (waited merely 45 minutes so far)--tune=butteraugli
while at the same time increasing --lag-in-frames=96
has it encoding at least 1 frame per minute right nowSo it appears that the "butteraugli" tune is forbiddingly slow.
I'm using my distro's libjxl 0.8.1.
My build command for aom-av1-lavish was
cmake -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_CCACHE=OFF -DENABLE_DOCS=OFF -DENABLE_EXAMPLES=ON -DENABLE_NASM=OFF -DENABLE_TESTS=no -DENABLE_TOOLS=ON -DENABLE_WERROR=OFF -DCONFIG_BIG_ENDIAN=0 -DCONFIG_TUNE_BUTTERAUGLI=1 -DENABLE_NEON=OFF -DENABLE_ARM_CRC32=OFF -DENABLE_NEON_DOTPROD=OFF -DENABLE_NEON_I8MM=OFF -DENABLE_SVE=OFF -DENABLE_MMX=ON -DENABLE_SSE=ON -DENABLE_SSE2=ON -DENABLE_SSE3=ON -DENABLE_SSSE3=ON -DENABLE_SSE4_1=ON -DENABLE_SSE4_2=ON -DENABLE_AVX=ON -DENABLE_AVX2=ON -DENABLE_VSX=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCONFIG_TUNE_BUTTERAUGLI=1 -DCMAKE_C_FLAGS="-march=native -Ofast -pipe -fomit-frame-pointer -g0 -fgraphite-identity -fno-common -flto=12 -fmerge-all-constants -falign-functions=32 -fno-stack-protector -floop-strip-mine -floop-block -ftree-vectorize -floop-interchange -floop-nest-optimize -floop-parallelize-all -fstack-check=no -fno-stack-check -fno-stack-clash-protection" -DCMAKE_C_FLAGS_INIT="-flto=12 -static" /opt/aom-av1-lavish-3b4594d81bed823c41ad95a195cd4b321aebdd07
A bit of GCC ricing, but it's a release build.
Is the enormous performance impact normal for the "butteraugli" tune? Can I do something to reduce this? Newer version of libjxl? Any compile flags for CMake?
I have built aom-av1-lavish from
opmox/mainline-merge
(commit 3b4594d81bed823c41ad95a195cd4b321aebdd07) and I'm now trying to compare it against vanilla aomenc 3.8.0.However, while vanilla aomenc 3.8.0 starts processing and is well on the way a few minutes after entering pass 2, aom-av1-lavish just sits there. It reads
lag-in-frames
frames, then (at least for 6 hours now) doesn't achieve anything.This is how I launch both versions (for aomenc 3.8.0, set
lag-in-frames
to 48 and remove thetune
options):I know running with
cpu-used 0
is a bit bonkers, but I wanted to see what possible at maximum settings.But vanilla aomenc 3.8.0 estimates about 80 hours (< 4 days) of encoding time, about 2 frames per minute, whereas aom-av1-lavish has, after ~6 hours of waiting, not managed to process even one frame and thus, no estimate.
So I suspect there is a problem when the above combination of parameters is used. Unless the effect of the "butteraugli" tune and/or
lag-in-frames
at 80 is so drastic that it takes >6 hours to process a single frame.