Closed cedrik-fuoco-adsk closed 1 year ago
It seems this PR fixes the universal build from Intel based macOS, so it could fix the failing Python wheels nightly build, will be good to check, once this is merged.
Turning off SSE2NEON with -DOCIO_USE_SSE2NEON=OFF
fails to compile. Looks like clang is trying to compile SSE code as arm64.
In file included from /Users/mark/Dev/OpenColorIO/src/OpenColorIO/ops/cdl/CDLOpCPU.cpp:12:
/Users/mark/Dev/OpenColorIO/src/OpenColorIO/SSE.h:45:23: error: unknown type name '__m128'
static inline __m128 _mm_max_ps(__m128 a, __m128 b)
On apple arm64 i'm getting 5 test_cpu_exec failed tests
[ 153/1049] [CPUProcessor / optimizations ] - FAILED
[ 417/1049] [FileFormatCTF / bake_1d_3d ] - FAILED
[ 427/1049] [FileFormatHDL / bake_1d_shaper ] - FAILED
[ 462/1049] [FileFormatResolveCube / bake_1d_shaper ] - FAILED
[ 471/1049] [FileFormatSpi1D / bake_1d_shaper ] - FAILED
I don't think they are related to this pull request as I get the same ones from the current main branch. I haven't dug into it but they look kind of like rounding errors.
All the cpu test pass under rosetta.
Thanks @remia, @markreidvfx and @michdolan for your comments.
Everything should be covered by the new commits.
do we need to add / update the macOS build variants in CI to cover all the new configurations?
At least, I think adding an entry for OCIO_USE_SSE2NEON=OFF in the macOS build variants would be good. @remia, Is there a limit to the number of tasks in a particular workflow?
@cedrik-fuoco-adsk I think only 20 jobs can run in parallel on a workflow but nothing prevent us from having more than 20, it will just take longer. Another possibility is also to add new variants to a nightly build if these are rare build settings.
Thanks for your help with this @markreidvfx !
We updated the install instructions accordingly. Going to merge this as the expected tests passed before we updated the documentation.
This is a PR that is replacing the PR about ARM Neon here: https://github.com/AcademySoftwareFoundation/OpenColorIO/pull/1775.
The new PR contains all the previous code changes that were already accepted in the previous PR and contains new commits that were needed to integrate it with the code from the PR Add AVX2/AVX/SSE2 SIMD accelerated 1D/3D LUTS (merged in main already).
Therefore, please ignore the first commit named "Merging the previous ARM Neon branch".