apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.19k stars 434 forks source link

[VL] Build gluten on aarch64 with VCPKG #7664

Open z-anderson opened 1 week ago

z-anderson commented 1 week ago

Problem description

I'm trying to build gluten and velox on aarch64 architecture. I'm running the script ./dev/builddeps-veloxbe.sh --enable_vcpkg=ON --spark_version=3.5 . I get the error below. I see that these options -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 are only for x86_64, so they won't work on aarch64. Do you know how I can build gluten for aarch64 please? (In version 1.1.x.x., we were able to build for both aarch64 and x86_64).

I see vcpkg init.sh has VCPKG_TRIPLET=x64-linux-avx https://github.com/apache/incubator-gluten/blob/c82af60a635fb52bf1314cd831f5915dd37d910d/dev/vcpkg/init.sh#L10. It makes sense that the x64 and the avx parts of this triplet will cause errors running on aarch64. I see it got changed in https://github.com/apache/incubator-gluten/commit/f00e4536bc7ccc56a2eee5800084319130cbf806#diff-4a8a09cebf7ed92cb7321136b9ef223c4653757270e77e942fd6fe2dca30793e - do you recommend that we should cherry-pick that commit (and a few more commits that it depends on)?

CMake Error at /usr/local/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "/usr/bin/cc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: '/home/circleci/project/dev/vcpkg/.vcpkg/buildtrees/detect_compiler/x64-linux-avx-rel/CMakeFiles/CMakeScratch/TryCompile-9obttD'

    Run Build Command(s): /usr/bin/ninja -v cmTC_32844
    [1/2] /usr/bin/cc   -fPIC -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 -o CMakeFiles/cmTC_32844.dir/testCCompiler.c.o -c /home/circleci/project/dev/vcpkg/.vcpkg/buildtrees/detect_compiler/x64-linux-avx-rel/CMakeFiles/CMakeScratch/TryCompile-9obttD/testCCompiler.c
    FAILED: CMakeFiles/cmTC_32844.dir/testCCompiler.c.o
    /usr/bin/cc   -fPIC -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 -o CMakeFiles/cmTC_32844.dir/testCCompiler.c.o -c /home/circleci/project/dev/vcpkg/.vcpkg/buildtrees/detect_compiler/x64-linux-avx-rel/CMakeFiles/CMakeScratch/TryCompile-9obttD/testCCompiler.c
    cc: error: unrecognized command line option ‘-mavx2’
    cc: error: unrecognized command line option ‘-mfma’
    cc: error: unrecognized command line option ‘-mavx’
    cc: error: unrecognized command line option ‘-mf16c’
    cc: error: unrecognized command line option ‘-mlzcnt’
    cc: error: unrecognized command line option ‘-mbmi2’
    ninja: build stopped: subcommand failed.

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:11 (enable_language)

System information

Velox System Info v0.0.2 Commit: 69a6bb9602c97050cc3459a09c846b84182ecb9c CMake Version: 3.28.3 System: Linux-5.15.0-1070-aws Arch: aarch64 CPU Name: Model name: Neoverse-N1 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local;/usr/local;/usr/X11R6;/usr/pkg;/opt

\nThe results will be copied to your clipboard if xclip is installed.

CMake log

No response

PHILO-HE commented 1 week ago

do you recommend that we should cherry-pick that commit (and a few more commits that it depends on)?

@z-anderson, I guess that commit will not fix this issue.

With VCPKG_TRIPLET=x64-linux-avx, vcpkg will use a custom triplet file we created, see https://github.com/apache/incubator-gluten/blob/main/dev/vcpkg/triplets/x64-linux-avx.cmake. You can see some flags are set specifically for x86_64.

To support arm64, I recommend you to create another triplet file with proper flags set, e.g. set(VCPKG_TARGET_ARCHITECTURE arm64)(see reference link). Then, in Gluten script, set VCPKG_TRIPLET to this new custom triple instead of x64-linux-avx when target architecture is arm64 (we may make target architecture a configurable build option).

z-anderson commented 1 week ago

Thank you!

PHILO-HE commented 1 week ago

@z-anderson, could you create a pr to fix it?

FelixYBW commented 1 week ago

@z-anderson would you like to create the PR and maintain it? We currently don't have any ARM CI tests actually. There are several guys submitting PRs on ARM. Maybe you all can work together and contribute.

z-anderson commented 6 days ago

Sounds good. Yes, I'll put up a PR with my changes to https://github.com/apache/incubator-gluten/blob/main/dev/vcpkg/init.sh for this.

z-anderson commented 6 days ago

I put up a draft https://github.com/apache/incubator-gluten/pull/7687 . This PR gets past that error, but I'm still working on getting it to build completely (at least for my set up).

z-anderson commented 4 days ago

I don't have permission to create a new github runner (with ARM). Could someone else create the new runner, please? Then I'll use it in https://github.com/apache/incubator-gluten/blob/main/.github/workflows/build_bundle_package.yml

PHILO-HE commented 4 days ago

@z-anderson, could you first validate your fix in your arm64 environment? Currently, our CI resources are near to the limitation. We can add new jobs on arm in the future when some resources are freed up.

z-anderson commented 4 days ago

In my arm64 environment, it actually finds the community triplet (so it doesn't use the new triplet file). Using the community triplet does correctly get it to start building for arm64 instead of amd.

FelixYBW commented 3 days ago

I don't have permission to create a new github runner (with ARM). Could someone else create the new runner, please? Then I'll use it in https://github.com/apache/incubator-gluten/blob/main/.github/workflows/build_bundle_package.yml

You may track the UT internally firstly. We may plan to add the ARM runner in future.

z-anderson commented 3 days ago

So will we continue the PR, but not do CI tests with an ARM runner for now?

PHILO-HE commented 3 days ago

So will we continue the PR, but not do CI tests with an ARM runner for now?

@z-anderson, yes, we can land this pr as long as it is verified on your side.