Support for arm64 wheels and CPU Features

gaby commented 6 months ago

@abetlen Thank you for the new efforts to start publishing wheels for CUDA, etc.

I noticed that the METAL wheels only work for darwin platform, when using Docker in MacOS the platform is arm64/linux not darwin.

I have a repo where I was building arm64/wheels that could probably be integrated into your workflows: https://github.com/gaby/arm64-wheels

TLDR

    steps:
      - name: Checkout abetlen/llama-cpp-python
        uses: actions/checkout@v4
        with:
          repository: 'abetlen/llama-cpp-python'
          ref: '${{ matrix.version }}'
          submodules: 'recursive'

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
        with:
          platforms: linux/arm64

      - name: Build wheels
        uses: pypa/cibuildwheel@v2.16.5
        env:
          CIBW_SKIP: "*musllinux* pp*"
          CIBW_REPAIR_WHEEL_COMMAND: ""
          CIBW_ARCHS: "aarch64"
          CIBW_BUILD: "cp311-*"
        with:
          output-dir: wheelhouse/

      - name: Upload wheels as artifacts
        uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.version }}
          path: wheelhouse/*.whl

This would need to be expanded to support other Python versions/Pypy.

I also notice the CPU wheels don't have specifics about AVX, AVX2, AVX512 are there plans to add support for those?

abetlen commented 6 months ago

Hey @gaby thank you I'll add support for that soon, do you mind giving me a hand testing when the PR is ready?

Wrt the cpu wheels, I'm conflicted because it does really blow up the power set of builds that have to be run for each release. My thought process for the wheels is to build something that works okay for most people but if you want it to run quickly you should build from source.

My current position is that I'm willing to expand the number of builds if we also implement some optimizations each time to mitigate the combinatorial explosion.

Some thoughts I have for long term solutions

At the moment we're building llama.cpp for each (platform, python version) however we don't actually bind to the python api so this isn't necessary and we can just build once for each platform and copy in the shared library and other required files. This would involve modifying the build process but should offer a significant speedup for ci runs.
For AVX, AVX2, AVX512 we could experiment with shipping all of them in llama-cpp-python, load the basic shared library, check the cpu flags available via ggml and then load the appropriately optimized shared library. The first optimization would likely be a pre-requisite for this but I think it's a valuable speedup.

gaby commented 6 months ago

Yeah, I can definitely test the arm64/linux wheels on a Raspberry PI.

I was using the wheels from https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels for the longest, but yes it does blow out the number of CI builds/job that get created. I do like your idea of having ggml check the CPU flags to determine what to/not use.

abetlen commented 6 months ago

@gaby sorry I was just re-reviewing this, so currently is it that the wheels that end in _arm64.whl don't work inside of docker on MacOS and we should replace them with the wheels built using the cibuildwheel process from your repo?

gaby commented 6 months ago

@abetlen If you install the package in MacOS directly the platform is darwin/arm64, which you have wheels for already. If you install the package in MacOS with Docker the platform inside the container is linux/arm64. This is due to MacOS using QEMU when running containers.

The linux/arm64 platform would also benefit users of Raspberry Pi, specially Pi4/Pi5.

abetlen commented 6 months ago

@gaby thank you, I've added your provided code to the release workflow for python versions 3.8-3.12, can you let me know if it works correctly?

gaby commented 6 months ago

@abetlen I don't see any arm64 wheels here https://abetlen.github.io/llama-cpp-python/whl/cpu/llama-cpp-python/

Running pip install confirms it can't find wheels:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      *** scikit-build-core 0.9.2 using CMake 3.29.2 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpwa68rmyp/build/CMakeInit.txt
      -- The C compiler identification is unknown
      -- The CXX compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.

      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_CXX_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
        to the compiler, or to the compiler name if it is in the PATH.

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
root@eb7b81f37400:/# uname -m
aarch64

I think it's related to this line in the CI https://github.com/abetlen/llama-cpp-python/blob/main/.github/workflows/build-and-release.yaml#L70

abetlen commented 6 months ago

@gaby looks like they were built and uploaded as artifacts but not added to the release(?)

I'll take a look later but this is the last workflow run file if you can spot what I'm doing wrong.

Smartappli commented 6 months ago

@gaby looks like they were built and uploaded as artifacts but not added to the release(?)

I'll take a look later but this is the last workflow run file if you can spot what I'm doing wrong.

@gaby @abetlen Fixed here: https://github.com/abetlen/llama-cpp-python/pull/1392/files

Smartappli commented 6 months ago

@abetlen Test: https://github.com/Smartappli/llama-cpp-python/releases/tag/test2

abetlen commented 6 months ago

@Smartappli wow thank you so much!

asterbini commented 3 months ago

Would be nice to have arm64 updated builds, as the last conda package has no support for many model types

abetlen / llama-cpp-python

Support for arm64 wheels and CPU Features #1342