ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Compile arm64 code from x86 OSX as a host #994

Closed ilya-lavrenov closed 1 year ago

ilya-lavrenov commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version': https://github.com/ARM-software/ComputeLibrary/tree/v22.08

Platform: Apple with x86 processor

Operating System: OSX 12

Problem description:

We are trying to build arm64 binaries on Apple x86 host, see https://github.com/openvinotoolkit/openvino_contrib/pull/416 The cmake is able to handle such scenario correctly and provides different variables, like:

            -DCMAKE_SYSTEM_NAME=Darwin
            -DCMAKE_SYSTEM_PROCESSOR=arm64
            -DCMAKE_OSX_ARCHITECTURES=arm64

And it allows to compile generic code for arm64, but for ARM Compute library we have:

error: unknown target CPU 'armv8.6-a' 
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4 

Can we fix this?

morgolock commented 1 year ago

Hi @ilya-lavrenov

From the error you shared it looks like the build system is getting confused and calling the host compiler instead of the aarch64 compiler.

We support native aarch64 builds on macos but we have not tested cross-compilation on macos, so there may be a problem there.

Could you please share the complete error?

ilya-lavrenov commented 1 year ago

The logs below is the only info I have:

2022-09-04T21:38:14.8309740Z create_version_file(["/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/arm_compute_version.embed"], [])
2022-09-04T21:38:14.8318320Z ccache clang++ -o /Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/common/cpuinfo/CpuInfo.o -c -DARCH_ARM -Wextra -Wdisabled-optimization -Wformat=2 -Winit-self -Wstrict-overflow=2 -Wswitch-default -Woverloaded-virtual -Wformat-security -Wctor-dtor-privacy -Wsign-promo -Weffc++ -Wno-overlength-strings -Wall -std=c++14 -pedantic -Wno-vla-extension -march=armv8.6-a -DENABLE_FP16_KERNELS -DENABLE_FP32_KERNELS -DENABLE_QASYMM8_KERNELS -DENABLE_QASYMM8_SIGNED_KERNELS -DENABLE_QSYMM16_KERNELS -DENABLE_INTEGER_KERNELS -DENABLE_NCHW_KERNELS -O3 -fPIC -fsigned-char -ffunction-sections -fdata-sections -fdiagnostics-show-option -Wundef -Wreturn-type -Wunused-variable -Wswitch -Wno-error=deprecated-declarations -Wno-undef -Wno-error=return-stack-address -D_GLIBCXX_USE_NANOSLEEP -DARM_COMPUTE_CPP_SCHEDULER=1 -DENABLE_NEON -DARM_COMPUTE_ENABLE_NEON -DARM_COMPUTE_ENABLE_I8MM -DARM_COMPUTE_ENABLE_BF16 -DARM_COMPUTE_ENABLE_FP16 -DARM_COMPUTE_ENABLE_SVEF32MM -DARM_COMPUTE_GRAPH_ENABLED -DARM_COMPUTE_CPU_ENABLED -DARM_COMPUTE_VERSION_MAJOR=28 -DARM_COMPUTE_VERSION_MINOR=0 -DARM_COMPUTE_VERSION_PATCH=0 -Iinclude -I. -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core -Isrc/core -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/NEON/kernels/convolution/common -Isrc/core/NEON/kernels/convolution/common -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/NEON/kernels/convolution/winograd -Isrc/core/NEON/kernels/convolution/winograd -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/NEON/kernels/arm_conv/depthwise -Isrc/core/NEON/kernels/arm_conv/depthwise -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/NEON/kernels/arm_conv/pooling -Isrc/core/NEON/kernels/arm_conv/pooling -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/NEON/kernels/arm_conv -Isrc/core/NEON/kernels/arm_conv -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/core/NEON/kernels/assembly -Isrc/core/NEON/kernels/assembly -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/arm_compute/core/NEON/kernels/assembly -Iarm_compute/core/NEON/kernels/assembly -I/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/cpu/kernels/assembly -Isrc/cpu/kernels/assembly src/common/cpuinfo/CpuInfo.cpp
2022-09-04T21:38:14.8326460Z error: unknown target CPU 'armv8.6-a'
2022-09-04T21:38:14.8329760Z note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4
2022-09-04T21:38:14.8332610Z scons: *** [/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty/src/common/cpuinfo/CpuInfo.o] Error 1
2022-09-04T21:38:14.8333940Z scons: building terminated because of errors.
morgolock commented 1 year ago

Hi @ilya-lavrenov

The version of clang++ that you are using does not support arch=armv8.6-a , this looks like a problem in the toolchain rather than in Arm Compute Library.

2022-09-04T21:38:14.1657400Z cd /Users/runner/work/1/openvino_contrib/modules/arm_plugin/thirdparty/ComputeLibrary && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E env /usr/local/bin/scons neon=1 opencl=0 cppthreads=1 embed_kernels=0 examples=0 Werror=0 data_layout_support=nchw build_dir=/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty arch=armv8.6-a os=macos compiler_cache=ccache extra_cxx_flags=-fPIC\ \ -fsigned-char\ -ffunction-sections\ -fdata-sections\ -fdiagnostics-show-option\ -Wundef\ -Wreturn-type\ -Wunused-variable\ -Wswitch\ -Wno-error=deprecated-declarations\ -Wno-undef\ -Wno-error=return-stack-address
2022-09-04T21:38:14.1759910Z scons: Reading SConscript files ...
2022-09-04T21:38:14.1861310Z Using compilers:
2022-09-04T21:38:14.1936420Z CC ccache clang
2022-09-04T21:38:14.2037460Z CXX ccache clang++

In the command above scons is not pointing to a cross-compiler and instead is trying to use the native host compiler which does not support arm64 targets. The CPU targets listed in the error message are not ARM targets:

2022-09-04T21:38:14.8329760Z note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4

To fix this you should update the PATH variable when you call scons to point to the correct clang++ version which supports arm64.

Something like this: PATH=$PATH:../../toolchains/android-ndk-r23-beta5/toolchains/llvm/prebuilt/linux-x86_64/bin/ scons arch=armv8.6-a-sve neon=1 opencl=0 embed_kernels=1 extra_cxx_flags="-fPIC" benchmark_tests=0 validation_tests=0 ....

Hope this helps.

ilya-lavrenov commented 1 year ago

@morgolock why the same version of compiler works for code which is being built using cmake, and does not work when we build ARM compute via scons? :) didn't you find it strange?

The compiler on OSX 11 and higher supports both arm64 and x86_64 instructions via -arch option. This is mapped on CMAKE_OSX_ARCHITECTURES:

So, we expect that ARM compute build system can reuse this capability and allow to compile with universal clang arm64 on x86 architecture (read https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary). In general, default clang is already able to cross-compile since Apple uses the same compiler to build for iPhone which is ARM-based for many years

Thanks, Ilya.

morgolock commented 1 year ago

Hi @ilya-lavrenov

Yes, I'll look into the scons files because we have not tested cross-compilation on macos.

The compiler on OSX 11 and higher supports both arm64 and x86_64 instructions via -arch option. This is mapped on CMAKE_OSX_ARCHITECTURES: if it contains -DCMAKE_OSX_ARCHITECTURES=x86_64, -arch x86_64 is added and binary x86 is compiled if it contains -DCMAKE_OSX_ARCHITECTURES=arm64, -arch arm64 is added and binary arm64 is compiled if it contains both -DCMAKE_OSX_ARCHITECTURES=arm64;x86_64, then universal binary is compiled which contains both instructions.

I however don't see how this could possibly work because these cmake variables won't affect scons at all.

To cross-compile ACL you will need to change the build command to point it to a toolchain that supports arm64 targets, simply add the path variable to this command and make it point to the correct toolchain:

cd /Users/runner/work/1/openvino_contrib/modules/arm_plugin/thirdparty/ComputeLibrary && /usr/local/Cellar/cmake/3.24.1/bin/cmake -E env /usr/local/bin/scons neon=1 opencl=0 cppthreads=1 embed_kernels=0 examples=0 Werror=0 data_layout_support=nchw build_dir=/Users/runner/work/1/_w/build/build-modules/arm_plugin/thirdparty arch=armv8.6-a os=macos compiler_cache=ccache extra_cxx_flags=-fPIC\ \ -fsigned-char\ -ffunction-sections\ -fdata-sections\ -fdiagnostics-show-option\ -Wundef\ -Wreturn-type\ -Wunused-variable\ -Wswitch\ -Wno-error=deprecated-declarations\ -Wno-undef\ -Wno-error=return-stack-address

Would this potential solution work for you?

Hope this helps.

ilya-lavrenov commented 1 year ago

I however don't see how this could possibly work because these cmake variables won't affect scons at all.

For cmake these variables are needed because it allows to build x86, arm64 or univeral2 code. For scons users specify targets like x86-64, x86-64-v2 or armv8.6-a, so here is a mapping:

-DCMAKE_OSX_ARCHITECTURES=x86_64 => arch=x86-64 -DCMAKE_OSX_ARCHITECTURES=arm64 => arch=armv8.6-a

What actually concerns me is how scons detects the values like:

x86-64, x86-64-v2, x86-64-v3, x86-64-v4

? Why have he decided that x86-64 is acceptable? I'm not aware whether ARM compute can be used for x86-64-v4 actually. But in any case - on OSX 11 with XCode 12.1 or higher scons should understand that we can compile additional architectures for os=macos since the compiler allows to do it.

make it point to the correct toolchain:

Maybe we don't understand each other, but I have already wrote that clang from xcode 12.2 already host and cross-compiler, but it seems that scons does not know about it and still complains about invalid armv8.6-a. You need to teach scons that on x86 host it's allowed to use armv8.6-a target and map it to proper clang compile options.

morgolock commented 1 year ago

Hi @ilya-lavrenov

My understanding is that you have two toolchains on that host:

CMake environment won't affect ACL's scons build. You have to manually override the PATH variable to point it to the arm64 cross-compiler in the host machine when calling scons

For example: I have many cross-compilers on my Linux system and I choose which one to use by just updating the PATH variable depending on which system I'm targetting.

PATH=$TOOLCHAINS/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin/:$PATH scons arch=armv8a neon=1 opencl=1 validation_tests=1 examples=0 benchmark_examples=0 opencl=0 standalone=1 debug=1 arch=armv8.2-a -j18

Notice the PATH=$TOOLCHAINS/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/bin/:$PATH before the scons, this is required to so that scons finds the correct cross-compiler.

I don't have an x86-64 macos system to try this but it should work as in Linux.

Hope this helps.

ilya-lavrenov commented 1 year ago

Done WA via cross-compilation https://github.com/openvinotoolkit/openvino_contrib/pull/438