AcademySoftwareFoundation / OpenColorIO

A color management framework for visual effects and animation.
https://opencolorio.org
BSD 3-Clause "New" or "Revised" License
1.79k stars 454 forks source link

Adsk contrib - Add support for neon intrinsic integration #1828

Closed cedrik-fuoco-adsk closed 1 year ago

cedrik-fuoco-adsk commented 1 year ago

This is a PR that is replacing the PR about ARM Neon here: https://github.com/AcademySoftwareFoundation/OpenColorIO/pull/1775.

The new PR contains all the previous code changes that were already accepted in the previous PR and contains new commits that were needed to integrate it with the code from the PR Add AVX2/AVX/SSE2 SIMD accelerated 1D/3D LUTS (merged in main already).

Therefore, please ignore the first commit named "Merging the previous ARM Neon branch".

remia commented 1 year ago

It seems this PR fixes the universal build from Intel based macOS, so it could fix the failing Python wheels nightly build, will be good to check, once this is merged.

markreidvfx commented 1 year ago

Turning off SSE2NEON with -DOCIO_USE_SSE2NEON=OFF fails to compile. Looks like clang is trying to compile SSE code as arm64.

In file included from /Users/mark/Dev/OpenColorIO/src/OpenColorIO/ops/cdl/CDLOpCPU.cpp:12:
/Users/mark/Dev/OpenColorIO/src/OpenColorIO/SSE.h:45:23: error: unknown type name '__m128'
        static inline __m128 _mm_max_ps(__m128 a, __m128 b)

On apple arm64 i'm getting 5 test_cpu_exec failed tests

[ 153/1049] [CPUProcessor / optimizations                                ] - FAILED
[ 417/1049] [FileFormatCTF / bake_1d_3d                                  ] - FAILED
[ 427/1049] [FileFormatHDL / bake_1d_shaper                              ] - FAILED
[ 462/1049] [FileFormatResolveCube / bake_1d_shaper                      ] - FAILED
[ 471/1049] [FileFormatSpi1D / bake_1d_shaper                            ] - FAILED

I don't think they are related to this pull request as I get the same ones from the current main branch. I haven't dug into it but they look kind of like rounding errors.

All the cpu test pass under rosetta.

cedrik-fuoco-adsk commented 1 year ago

Thanks @remia, @markreidvfx and @michdolan for your comments.

Everything should be covered by the new commits.

do we need to add / update the macOS build variants in CI to cover all the new configurations?

At least, I think adding an entry for OCIO_USE_SSE2NEON=OFF in the macOS build variants would be good. @remia, Is there a limit to the number of tasks in a particular workflow?

remia commented 1 year ago

@cedrik-fuoco-adsk I think only 20 jobs can run in parallel on a workflow but nothing prevent us from having more than 20, it will just take longer. Another possibility is also to add new variants to a nightly build if these are rare build settings.

doug-walker commented 1 year ago

Thanks for your help with this @markreidvfx !

We updated the install instructions accordingly. Going to merge this as the expected tests passed before we updated the documentation.