ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.97k stars 9.32k forks source link

illegal instructions error on Android #402

Closed aicoat closed 1 year ago

aicoat commented 1 year ago

first thanks for the wonderful works so far !!!

i manged to compile it in Linux and windows but i have a problem with android. i have A52 6 GB but i get "illegal instructions" error.

i compiled the source using wsl2 with ndk r25 without any errors. i moved the llama folder from sd card to "home" directory in (Termux) in order to have the execute command working. and i converted to original model using the newer source code to avoid "too old" error message but at the end i get this error.

i believe it is because of having avx, avx2 and other instruction already enabled in my build which is arm processors cant handle them but i cant figure it out how to change it to get it working on my android device. thanks in advanced <3 ScreenshotTermux

gjmulder commented 1 year ago

Please review and use our issue template to provide more details so we can try and better understand your problem and attempt to answer you.

himanshu09010 commented 1 year ago

the issue happens when you don't have enough RAM

dalnk commented 1 year ago

your a52 might just not allow termux to allocate that much ram. weird given you have 6GB

aicoat commented 1 year ago

@dalnk @himanshu09010 that is sad since i heard people are running it on a raspberry pi with 4gb of ram :(

dniku commented 1 year ago

I believe I have encountered the same problem. I have tried to compile the binary directly on my phone (Samsung S22 Ultra with Snapdragon CPU, model SM-S908E). I followed these steps in Termux:

pkg up
pkg install wget git make clang

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
<put model into models/llama-7B/ggml-model.bin>
./main

This fails with Illegal instruction.

Screenshot_20230329_231358_Termux

Running gdb ./main (and then at the gdb prompt first r and then disassemble), I see that this is because of the cnth x14 instruction in ggml_new_tensor_impl().

Screenshot_20230329_231518_Termux

The cnth instruction is documented here. For some reason, the CPU in my phone doesn't support it.

dalnk commented 1 year ago

Did you compile with the NDK instructions first?

@aicoat on raspberrypi it's possible to enable swap which afaik I haven't gotten working on Android yet. If I ever get my hands on an a52 or similar I'll try to recreate your issue.

dniku commented 1 year ago

@dalnk the "illegal instruction" error arises when the source is compiled on the device itself. I'm not sure the NDK is relevant in this case.

...however, building the binary on Linux with NDK doesn't work either due to a different error: https://github.com/ggerganov/llama.cpp/issues/495#issuecomment-1487653827

DGdev91 commented 1 year ago

I just compiled it directly on a Xiaomi Mi 11T Pro (Snapdragon 888), and it's woking using the GPT4All model. Pretty slow, but works.

Maybe the last commits related to ARM_NEON fixed the issue indirectly?

dniku commented 1 year ago

Just tried recompiling the project for the latest commit (git pull && make clean && make). Still no luck. The offending instruction is still cnth.

dalnk commented 1 year ago

I'm curious if this works for you when compiling locally @dniku

last time I tried it, I had success without having to compile it on NDK

https://github.com/antimatter15/alpaca.cpp

dniku commented 1 year ago

@dalnk compiling alpaca.cpp directly on Android produces the exact same error caused by the same instruction.

1000008201.jpg

ghost commented 1 year ago

it looks like a snapdragon issue. I have 2 phones, one with sd865(6gb ram) and other one with sd8 gen2(12gb ram). On both of them i can compile llama.cpp/alpaca.cpp and run them, however, on 8 gen 2 i get illegal instruction error right after i start script. On sd865 i don't have this error, but due to ram overflow, i cant really use it

dniku commented 1 year ago

@GH228 I agree — according to this website, my CPU is SM8450 Snapdragon 8 Gen 1, and I think this error shows up rarely enough to assume that it only affects a few Snapdragon CPUs.

ghost commented 1 year ago

@GH228 I agree — according to this website, my CPU is SM8450 Snapdragon 8 Gen 1, and I think this error shows up rarely enough to assume that it only affects a few Snapdragon CPUs.

Are you also getting illegal instruction error when you run it? there were some significant changes in 8 gen 1, so basically 8 gen 1, 8+ gen 1 and 8 gen 2. Maybe it says that because X3 and X2 don't support arm32 instructions

dniku commented 1 year ago

@GH228

Are you also getting illegal instruction error when you run it?

Yes. Running ./main on my device throws an Illegal instruction error.

dniku commented 1 year ago

I have found a solution. It might not be the best one, but it seems to work.

First, apply the following patch:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1a434f0..bd98b97 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -186,7 +186,7 @@ if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES
         # TODO: arm msvc?
     else()
         if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
-            add_compile_options(-mcpu=native)
+            add_compile_options(-mcpu=generic)
         endif()
         # TODO: armv6,7,8 version specific flags
     endif()

Next,

mkdir build && \
cd build && \
cmake .. && \
make main && \
mv bin/main .. && \
cd .. && \
rm -r build

After this, examples/chat.sh works for me.

Funny enough, none of {cortex-x2,cortex-a710,cortex-a510} work. All three throw Illegal instruction errors.

By the way, this solution was suggested by ChatGPT:

1000008237.jpg

ghost commented 1 year ago

I have found a solution. It might not be the best one, but it seems to work.

First, apply the following patch:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1a434f0..bd98b97 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -186,7 +186,7 @@ if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES
         # TODO: arm msvc?
     else()
         if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
-            add_compile_options(-mcpu=native)
+            add_compile_options(-mcpu=generic)
         endif()
         # TODO: armv6,7,8 version specific flags
     endif()

Next,

mkdir build && \
cd build && \
cmake .. && \
make main && \
mv bin/main .. && \
cd .. && \
rm -r build

After this, examples/chat.sh works for me.

Funny enough, none of {cortex-x2,cortex-a710,cortex-a510} work. All three throw Illegal instruction errors.

By the way, this solution was suggested by ChatGPT:

1000008237.jpg

How to apply this "patch"? is this a script file or something. How can i run it?

dniku commented 1 year ago

@GH228 the easiest way is to edit CMakeLists.txt manually and replace -mcpu=native with -mcpu=generic here.

ghost commented 1 year ago

@GH228 the easiest way is to edit CMakeLists.txt manually and replace -mcpu=native with -mcpu=generic here.

Yes, i have found a way to patch using your script. It seems that it works only with llama.cpp, not alpaca.cpp But the ai literally displays some random text, ignoring the prompt. The prompt was "what is the difference between arm v7 and arm v8" The answer is:

what is the difference between arm v7 and arm v8? I think it's important to understand how the internet works so we can do that. I think we need to look at the facts, not just emotion. Let's take a look at the facts of internet history. The internet as a whole has changed dramatically in recent years due to the introduction of streaming media and more recently because of Bitcoin and Blockchain technology. These technologies have become so popular that they are now becoming part of our daily lives. We need to understand how the internet works so we can do that. I think it's important to understand how the internet works llama_print_timings: load time = 34516.57 ms llama_print_timings: sample time = 181.29 ms / 128 runs ( 1.42 ms per run) llama_print_timings: prompt eval time = 36397.24 ms / 13 tokens ( 2799.79 ms per token) llama_print_timings: eval time = 105441.63 ms / 127 runs ( 830.25 ms per run) llama_print_timings: total time = 142971.74 ms

I just asked the same model, but running on my pc the same question, the respond is:

what is the difference between arm v7 and arm v8? The ARMv8 architecture introduces 64-bit support, allowing for more efficient memory management as well as increased performance when running applications that require large amounts of data. It also includes a new instruction set called AArch64 which is optimized to take advantage of the larger address space and improved branch prediction capabilities provided by ARMv8.

i just compiled alpaca.cpp, replacing all mcpu flags, yeah, it works. I'm pretty sure it is model issue. But i didnt find any other models that would run with llama.cpp, besides that. So, if you know how to quantize to 4 bit from source, free to go. I would recommend to use alpaca.cpp

pjlegato commented 1 year ago

The CMakeLists.txt seems to select CPU instruction set extensions manually (https://github.com/ggerganov/llama.cpp/blob/master/CMakeLists.txt#L54-L58 and https://github.com/ggerganov/llama.cpp/blob/master/CMakeLists.txt#L193-L219). This will cause the output binary to throw "illegal instruction" errors when run on a CPU that does not support the selected instruction sets.

Unless deliberately cross-compiling to generate a binary that will run on some other CPU (which is rare), the extension instruction sets should not be specified manually. Setting -march=native will automatically select the best instruction sets available on the current CPU at compile time.

Even if you are cross-compiling, it's generally better to do -march=opteron (or whatever CPU you're targeting) rather than manually selecting instruction sets.

More info at https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

dniku commented 1 year ago

@pjlegato

Setting -march=native will automatically select the best instruction sets available on the current CPU at compile time.

That's theoretically true, but this whole thread is dedicated to the fact that when compiling the project on the Android device itself -mcpu=native (not -march=native though) causes Illegal instruction errors, while -mcpu=generic doesn't.

dfyz commented 1 year ago

@dniku

Oh my god, this proved to be a deep rabbit hole. I was able to reproduce this with a Galaxy Z Flip4 (which uses Snapdragon 8+ Gen 1) and started investigating.

First off, the problematic instructions on your gdb screenshots (cnth, as you mentioned, but also rdvl) are part of SVE. So a more targeted workaround for this problem is replacing -mcpu=native with -mcpu=native+nosve, which prevents the compiler from trying to automatically vectorize loops using SVE. Fun fact: in this instance, clang decided to use SVE for this loop from ggml_new_tensor_impl() in order to speed up copying [checks notes] 2 (two) integers.

A much more interesting question is why this happens in the first place. As far as I understand, clang uses this function to parse the first 32 lines of /proc/cpuinfo and determine the CPU model. More precisely, it uses the CPU implementer and CPU part fields. In our case, the implementer is 0x41 (ARM Ltd.) and the parts are 0xd46 (cortex-a510), 0xd47 (cortex-a710), and 0xd48 (cortex-x2). This is somewhat strange (my older Samsung has 0x53 -- Samsung Electronics Co., Ltd. in the CPU implementer, which makes more sense), but apparently the newer Snapdragon Kryo CPUs are indeed based on ARM cores.

My initial guess was that only some of the cores implemented SVE, but it turned out that all 3 Cortex models are ARMv9, where both SVE and its extension SVE2 are mandatory. Every description of Snapdragon 8 I can find on the Internet proudly declares that it is ARMv9-A and supports SVE(2). However, there are no indications of that in the Features field of /proc/cpuinfo. I then tried reading the SVE-related bits directly from ID_AA64PFR0_EL1 with a simple C program:

#include <stdio.h>
#include <inttypes.h>

int main() {
        uint64_t features = 0;
        asm("mrs %0, ID_AA64PFR0_EL1" : "=r"(features));
        printf("features: %lu\n", (features >> 32) & 0x1F);
}

It also prints 0 every time I run it. I don't believe it is possible for an OS to influence reads from this register, so it appears to be a genuine issue with the CPU. This comment also seems to confirm this.

So, to summarize: Qualcomm cores pretend to be ARMv9 cores while not being compliant ARMv9 cores, and clang believes them.

dfyz commented 1 year ago

@aicoat

Did you manage to resolve you problem? A52 seems to use an older Kryo core based on Cortex-A76, so the problem you were seeing might have been unrelated to the SVE one that @dniku and @GH228 stumbled upon.

If you still are seeing Illegal instruction, it would be great to run main under gdb (just as @dniku did) to see what the problematic instruction is.

Topping1 commented 1 year ago

I have found a solution. It might not be the best one, but it seems to work.

First, apply the following patch:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1a434f0..bd98b97 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -186,7 +186,7 @@ if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES
         # TODO: arm msvc?
     else()
         if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
-            add_compile_options(-mcpu=native)
+            add_compile_options(-mcpu=generic)
         endif()
         # TODO: armv6,7,8 version specific flags
     endif()

Next,

mkdir build && \
cd build && \
cmake .. && \
make main && \
mv bin/main .. && \
cd .. && \
rm -r build

After this, examples/chat.sh works for me.

Funny enough, none of {cortex-x2,cortex-a710,cortex-a510} work. All three throw Illegal instruction errors.

By the way, this solution was suggested by ChatGPT:

1000008237.jpg

I had the same problem compiling the code in my phone, a Xiaomi Mi 9 Se (Snapdragon 712 with 6Gb of Ram). Your edit to CMakeLists.txt fixed the issue.

yifeifang commented 1 year ago

-mcpu=generic worked for me

thuningxu commented 1 year ago

-mcpu=native+nosve works for me (Samsung S23 Ultra, Snapdragon 8 Gen 2)

jo-elimu commented 5 months ago

i compiled the source using wsl2 with ndk r25 without any errors.

@aicoat How did you compile the source code on Android? Would you mind sharing the commands you used?