AcademySoftwareFoundation / OpenImageIO

Reading, writing, and processing images in a wide variety of file formats, using a format-agnostic API, aimed at VFX applications.
https://openimageio.readthedocs.org
Apache License 2.0
1.99k stars 603 forks source link

[HELP] Enabling SIMD / instruction set selection #3268

Open gregcotten opened 2 years ago

gregcotten commented 2 years ago

Hi there, if I compile OIIO without explicitly setting the SIMD architectures I want to use, does SIMD get skipped altogether? What SIMD architectures should I use for Intel + Apple Silicon Macs?

lgritz commented 2 years ago

All Intel x86_64 have at least sse2, so that will always be enabled on an Intel Mac. Almost certainly any mac you're running also has sse4.2 capabilities, and most of the new ones can also enable avx2 and f16c, so you might be able to enable those depending on your hardware. If you run oiiotool --help, at the very bottom it will tell you what your OIIO is built for, and also what it detects from the hardware at runtime (but isn't being used by OIIO). For example, my 2019 model Macbook Pro 20 says:

OIIO 2.4.0spi built for C++14/201402 sse2,sse3,ssse3,sse41,sse42
Running on 16 cores 32.0GB
    sse2,sse3,ssse3,sse41,sse42,avx,avx2,fma,f16c,popcnt,rdrand

so I could also turn on avx2, fma, f16p.

As for Apple Silicon... that's trickier. We currently don't do any auto-detection (mostly because I don't have easy access to one of those machines yet)). I think you could enable the ARM SIMD with the right compiler flags, see https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

lgritz commented 2 years ago

I should add: if somebody has access to an ARM-based machine and would like to help make this work as smoothly for OIIO as it does on Intel, these contributions would be extremely welcome. It mainly consists of:

  1. Getting cpu feature detection working for ARM, see platform.h where things like cpu_has_avx2() are defined, and the few places where they are used.

  2. Getting build flags right, see compiler.cmake where it says "SIMD and machine architecture options". Actually, this literal code may not need to be changed, since it's just splitting the USE_SIMD cmake variable and passing the parts as -m<feature> flags to the compiler. But we may need to catalog and document the feature names that should be used for ARM chips.

  3. Running full tests and making sure everything is ok when these features are enabled.

  4. Scouring our simd.h for simd wrapper functions that don't have NEON implementations, and adding the right intrinsics. This can be done piece by piece over time, as long as none of what's already there is actually wrong.

SergMariaDB commented 2 years ago

There are some of OpenCL Implementations for SIMD from Intel. Here is a generic way to make a pattern for example: https://github.com/ohhmm/generator/blob/13be2aadd932cc1fb9830a7b4a664557dbaf26e6/generator/generator.cpp#L38