How to utilize GPU on Android to accelerate inference?

ElaineWu66 commented 1 month ago

Discussed in https://github.com/ggerganov/llama.cpp/discussions/8704

^{Originally posted by **ElaineWu66** July 26, 2024} I am trying to compile and run llama.cpp demo on my android device (QUALCOMM Adreno) with linux and termux. Any suggestion on how to utilize the GPU? I have followed tutorial https://github.com/JackZeng0208/llama.cpp-android-tutorial, since the OpenCL is broken and removed now, it's not working. Thanks!!!

ngxson commented 1 month ago

The android docs can be found here (please let me know if it's still up-to-date): https://github.com/ggerganov/llama.cpp/blob/master/docs/android.md

A while ago I saw someone build android+vulkan that kinda work but buggy. I couldn't test it myself, but you can give a try: https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+android+vulkan+

ElaineWu66 commented 1 month ago

The android docs can be found here (please let me know if it's still up-to-date): https://github.com/ggerganov/llama.cpp/blob/master/docs/android.md

A while ago I saw someone build android+vulkan that kinda work but buggy. I couldn't test it myself, but you can give a try: https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+android+vulkan+

I was able to compile and run with Android CPU following the android docs instruction. I just wanna know how I can utilize the GPU.

I've seen the posts saying they build android+vulkan and it was buggy, but there is no detailed instruction.. I'm very new to vulkan, is there any step-by-step tutorial that I can follow? Really appreciate it, thanks a ton!!!

ngxson commented 1 month ago

Unfortunately, I'm not working on vulkan or android so I can't help much. It would be nice if someone can share how to do that. Probably you can follow this thread for some clues: https://github.com/ggerganov/llama.cpp/issues/5186

FranzKafkaYu commented 1 month ago

I am also trying to accelerate inference by enable GPU&Vulkan backend in Android，but I encountered a build error like this:

franzkafka95@franzkafka95:~/Desktop/llama/llama.cpp/build-android$ make -j8
[  1%] Built target build_info
[  1%] Built target sha256
[  2%] Built target xxhash
[  3%] Built target sha1
[  4%] Built target vulkan-shaders-gen
[  5%] Generate vulkan shaders
/bin/sh: 1: vulkan-shaders-gen: not found
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:123: ggml/src/ggml-vulkan-shaders.hpp] Error 127
make[1]: *** [CMakeFiles/Makefile2:1617: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

build configuration:

cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod  -DGGML_VULKAN=1 ..

I am using Ubuntu 22.0,Cmake Version 3.22.0,Android NDK r25,I have installed Vulkan SDK,like these libraries:

libvulkan-dev/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed]
libvulkan1/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
lunarg-vulkan-layers/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
mesa-vulkan-drivers/jammy-updates,now 23.2.1-1ubuntu3.1~22.04.2 amd64 [installed,automatic]
vulkan-extensionlayer/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-headers/jammy,jammy,now 1.3.290.0~rc1-1lunarg22.04-1 all [installed,automatic]
vulkan-profiles/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-sdk/jammy,jammy,now 1.3.290.0~rc1-1lunarg22.04-1 all [installed]
vulkan-tools/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-utility-libraries-dev/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-utility-libraries/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-validationlayers/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
vulkancapsviewer/jammy,now 3.41~rc1-1lunarg22.04-1 amd64 [installed,automatic]

anyone can help me fix this compile error?

FranzKafkaYu commented 1 month ago

Guess this problem is because the compiled vulkan-shaders-gen binary is for the ARM architecture, and when it's called in CMake or in a script to generate *.hpp files, it cannot be executed.

I have tried move the x86_64 arch vulkan-shaders-gen binary to the build-android/bin/ dir while still got this error.

egeoz commented 1 month ago

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

FranzKafkaYu commented 1 month ago

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

ElaineWu66 commented 1 month ago

I have encountered this issue when compiling with ' make GGML_VULKAN=1 ' through termux on my Android device. It seems like that I do not have the root permission to create the output directory

I guess the only way out is to cross compile with my laptop? Any suggestion on how to do that.... (I'm quite new to Android and I really need some followable tutorials to deal with the parameters in compilation)

Or any suggestion on how I can compile with termux on the Android device?

Big thanksss!!!

egeoz commented 1 month ago

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

Well, of course it can work; it is just a matter of development. However, in its current state, you have to manually disable feature checks and contend with 1 GB of VRAM, which either means a model as smart as a parakeet or splitting layers between GPU and CPU, which will probably make inference slower than pure CPU.

FranzKafkaYu commented 1 month ago

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

Well, of course it can work; it is just a matter of development. However, in its current state, you have to manually disable feature checks and contend with 1 GB of VRAM, which either means a model as smart as a parakeet or splitting layers between GPU and CPU, which will probably make inference slower than pure CPU.

Thank you very much. I am new to AI and appreciate the other frameworks you recommended. I will try to understand and learn more about them. As for this issue, I still hope that a developer can investigate this problem

jsamol commented 1 month ago

I found myself looking for an answer to the same question and struggling with the same issues. Eventually, after gathering solutions from many other issues, I prepared this Dockerfile that helps me cross-compile the library with Vulkan support, maybe someone will find it useful:

FROM ubuntu:24.04

### Prepare environment ###

# Install essential tools
RUN apt-get update -qqy && \
    apt-get install -qqy build-essential cmake make default-jre wget unzip git && \
    apt-get clean && \
    apt-get autoremove -y

# Set env vars
ENV ANDROID_TARGET_PLATFORM=android-33
ENV NDK_VERSION=27.0.12077973
ENV VULKAN_VERSION=1.3.292

ENV ANDROID_HOME=/usr/lib/android-sdk
ENV ANDROID_NDK_HOME=${ANDROID_HOME}/ndk/${NDK_VERSION}

### Install Android NDK ###

# Download command line tools
RUN wget -q https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O android-commandlinetools.zip && \
    unzip -q android-commandlinetools.zip -d ${ANDROID_HOME} && \
    mv ${ANDROID_HOME}/cmdline-tools ${ANDROID_HOME}/latest && \
    mkdir ${ANDROID_HOME}/cmdline-tools && \
    mv ${ANDROID_HOME}/latest ${ANDROID_HOME}/cmdline-tools && \
    rm android-commandlinetools.zip

ENV PATH=${PATH}:${ANDROID_HOME}/cmdline-tools/latest/bin

# Accept licenses
RUN yes | sdkmanager --licenses
# Download NDK
RUN sdkmanager "ndk;${NDK_VERSION}"

### Install Vulkan SDK ###

RUN wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | tee /etc/apt/trusted.gpg.d/lunarg.asc
RUN wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list http://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list

RUN apt-get update -qqy && \
    apt-get install -qqy vulkan-sdk

# Replace outdated Vulkan headers in Android NDK
RUN wget -q https://github.com/KhronosGroup/Vulkan-Headers/archive/refs/tags/v${VULKAN_VERSION}.zip -O vulkan-headers.zip && \
    unzip -q vulkan-headers.zip -d . && \
    rm -r ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/vk_video && \
    rm -r ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/vulkan && \
    mv ./Vulkan-Headers-${VULKAN_VERSION}/include/* ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include && \
    rm -r ./Vulkan-Headers-${VULKAN_VERSION} && \
    rm vulkan-headers.zip

### Build ###

RUN mkdir proj
COPY . proj
WORKDIR /proj

# Compile for host
RUN cmake -B build -DGGML_VULKAN=1
RUN cmake --build build --config Release

# Use host vulkan-shaders-gen
RUN mkdir bin
RUN mv build/bin/vulkan-shaders-gen bin/
RUN rm -rf build
ENV PATH=${PATH}:/proj/bin

# Compile for target (arm64-v8a)
RUN cmake -B build-android/arm64-v8a -DGGML_VULKAN=1 \
    -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=${ANDROID_TARGET_PLATFORM} \
    -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod
RUN cmake --build build-android/arm64-v8a --config Release

Unfortunately, when I then copy the binaries to my device and try to follow the next steps described in the guide, I run into CANNOT LINK EXECUTABLE "./llama-cli": library "libllama.so" not found. The library seems to be working, however, when I copy the output shared and static libraries instead and use them inside the llama.android example.

FranzKafkaYu commented 1 month ago

May I ask you one question why we need Replace outdated Vulkan headers in Android NDK?

jsamol commented 1 month ago

It appears that the ndk version I've been using ships with an incomplete set of headers, missing some that llama.cpp depends on. Without updating them in the ndk, the compilation fails with a missing headers error.

ggerganov / llama.cpp

How to utilize GPU on Android to accelerate inference? #8705

Discussed in https://github.com/ggerganov/llama.cpp/discussions/8704