Currently, several of the OCIO ops have a code path that uses Intel SSE intrinsics (SIMD instructions) for faster performance. However, on Mac M1/ARM chips, these don't work and so the slower straight C++ path is used. This task is to take advantage of the equivalent intrinsics available on ARM processors, known as Neon.
There are a number of open source projects available to help convert SSE instructions into Neon instructions. The one we propose using is:
https://github.com/DLTcollab/sse2neon
Note that there are some precision differences and some differences in NaN handling that will need to be investigated.
Currently, several of the OCIO ops have a code path that uses Intel SSE intrinsics (SIMD instructions) for faster performance. However, on Mac M1/ARM chips, these don't work and so the slower straight C++ path is used. This task is to take advantage of the equivalent intrinsics available on ARM processors, known as Neon.
There are a number of open source projects available to help convert SSE instructions into Neon instructions. The one we propose using is: https://github.com/DLTcollab/sse2neon
Note that there are some precision differences and some differences in NaN handling that will need to be investigated.