Add AVX512 accelerated 1D/3D LUTS

markreidvfx commented 8 months ago

ocioperf.exe --transform tests/data/files/clf/lut1d_32f_example.clf

Line by Line Average, lut dim 65, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz

ocioperf.exe --transform tests/data/files/clf/lut3d_preview_tier_test.clf

Line by Line Average, lut dim 33x33x33, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz

I've only been able to test on one machine with AVX512. Not exactly the performance gains I was hoping for. I'm still new to the instructions set, maybe there are some more optimizations we could do. There are quite a few AVX512 extensions. I've limited this implementation to just the AVX512F (foundation) instructions. That basically means any AVX512 capable CPU should be able run it.

Github actions use to have more intel CPU's with AVX512 available. Lately I've been getting only AMD EPYC CPU's without AVX512 for CI. I don't think there is anyway to request a specific cpu. This is very frustrating and will make this more difficult to maintain and test.

markreidvfx commented 8 months ago

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, the half float conversion instructions are all part of the AVX512F (Foundation) extension.

The exact overlap between AVX and AVX2 and F16c support has never been exactly clear to me. I think AVX2 pretty much guarantees F16c but I think its best to check with those extensions.

markreidvfx commented 8 months ago

I did a bit more perf testing of this with my old lut3d_perf tool

It also turns out that github runners on a private repos are different then the public repo ones. The private ones can have avx512.

I was able to test this pull request on windows with avx512 by setting up a private fork. I kinda used up all my free minutes for the month doing it but all the tests pass 😆

markreidvfx commented 6 months ago

@remia I added your suggestion to all the SIMD tests. I also rebased on top of the current main.

AcademySoftwareFoundation / OpenColorIO

Add AVX512 accelerated 1D/3D LUTS #1932