As it says on the tin - this turns on Link Time Optimization officially (which has been the default on Arch Linux for a while now).
Additionally, added the __restrict keyword to the Degrain_C function, which enables compilers to better optimize memory access/register usage. This only impacts 16-bit sources, as 8-bit uses the SSE2 implementation and isn't effected.
All outputs are identical between the master-clang, lto-clang, and lto-restrict-clang libraries. Used Clang as the compiler.
Tests were run with blocksize 8 and 16, on 4k, 1080p, and 540p content, with Degrain6 and Analyse (no Recalculate).
Note how the lto-restrict-clang is always the fastest of the bunch, sometimes by 3-4% just by using __restrict, while enabling LTO can have a 6-11% improvement on its own.
As it says on the tin - this turns on Link Time Optimization officially (which has been the default on Arch Linux for a while now).
Additionally, added the
__restrict
keyword to the Degrain_C function, which enables compilers to better optimize memory access/register usage. This only impacts 16-bit sources, as 8-bit uses the SSE2 implementation and isn't effected.All outputs are identical between the master-clang, lto-clang, and lto-restrict-clang libraries. Used Clang as the compiler.
Tests were run with blocksize 8 and 16, on 4k, 1080p, and 540p content, with Degrain6 and Analyse (no Recalculate).
-o 1
== blocksize 8-o 2
== blocksize 16Results:
1080p and 540p:
vspipe -p -e 3000 -o 1 --arg mvversion=master-clang --arg src=test-1080p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 1 --arg mvversion=lto-clang --arg src=test-1080p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 1 --arg mvversion=lto-restrict-clang --arg src=test-1080p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 2 --arg mvversion=master-clang --arg src=test-1080p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 2 --arg mvversion=lto-clang --arg src=test-1080p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 2 --arg mvversion=lto-restrict-clang --arg src=test-1080p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 1 --arg mvversion=master-clang --arg src=test-540p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 1 --arg mvversion=lto-clang --arg src=test-540p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 1 --arg mvversion=lto-restrict-clang --arg src=test-540p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 2 --arg mvversion=master-clang --arg src=test-540p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 2 --arg mvversion=lto-clang --arg src=test-540p.dgi tester.vpy /dev/null
vspipe -p -e 3000 -o 2 --arg mvversion=lto-restrict-clang --arg src=test-540p.dgi tester.vpy /dev/null
4k:
vspipe -p -e 500 -o 1 --arg mvversion=master-clang --arg src=test-4k.dgi tester.vpy /dev/null
vspipe -p -e 500 -o 1 --arg mvversion=lto-clang --arg src=test-4k.dgi tester.vpy /dev/null
vspipe -p -e 500 -o 1 --arg mvversion=lto-restrict-clang --arg src=test-4k.dgi tester.vpy /dev/null
vspipe -p -e 500 -o 2 --arg mvversion=master-clang --arg src=test-4k.dgi tester.vpy /dev/null
vspipe -p -e 500 -o 2 --arg mvversion=lto-clang --arg src=test-4k.dgi tester.vpy /dev/null
vspipe -p -e 500 -o 2 --arg mvversion=lto-restrict-clang --arg src=test-4k.dgi tester.vpy /dev/null
Note how the
lto-restrict-clang
is always the fastest of the bunch, sometimes by 3-4% just by using__restrict
, while enabling LTO can have a 6-11% improvement on its own.