ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
36.01k stars 3.68k forks source link

Illegal Instruction on Android 13 via Termux #967

Open womesiete opened 1 year ago

womesiete commented 1 year ago

I built whisper.cpp from source on a Galaxy S23 ultra (Android 13) in Termux, and when I run the following command...

whisper.cpp/main -f whisper.cpp/samples/jfk.wav -m whisper.cpp/models/ggml-base.en.bin -t 4

... it fails with the output below...

whisper_init_from_file_no_state: loading model from 'whisper.cpp/models/ggml-base.en.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 2 whisper_model_load: mem required = 310.00 MB (+ 6.00 MB per decoder) whisper_model_load: adding 1607 extra tokens whisper_model_load: model ctx = 140.66 MB Illegal instruction

I also get illegal instruction when attempting to generate quantized models.

The exact same process worked as expected on my old Galaxy S8 (Android 9.0). Any idea what might cause the difference?

The commands that I used to build were...

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
models/download-ggml-model.sh base.en
make
make samples
leuc commented 1 year ago

Having the same issue, it seems to fail at calling cntw

(gdb) bt
#0  0x00000063ab4b8954 in ggml_new_tensor_impl ()
#1  0x00000063ab4b8cd0 in ggml_new_tensor_2d ()
#2  0x00000063ab4db824 in whisper_model_load(whisper_model_loader*, whisper_context&) ()
#3  0x00000063ab4da3fc in whisper_init_no_state ()
#4  0x00000063ab4da25c in whisper_init_from_file_no_state ()
#5  0x00000063ab4e02d0 in whisper_init_from_file ()
#6  0x00000063ab4925d8 in main ()
│  > 0x63ab4b8954 <ggml_new_tensor_impl+308> cntw    x14                                                                                                                     │
│    0x63ab4b8958 <ggml_new_tensor_impl+312> mov     w13, w2                                                                                                                 │
│    0x63ab4b895c <ggml_new_tensor_impl+316> mov     w15, #0xc                       // #12                                                                                  │

lscpu

$ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A510
    Model:              2
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r0p2
    CPU(s) scaling MHz: 62%
    CPU max MHz:        1785.6000
    CPU min MHz:        307.2000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat i
                        lrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 bti
  Model name:           Cortex-A710
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 3
    Socket(s):          1
    Stepping:           r2p0
    CPU(s) scaling MHz: 100%
    CPU max MHz:        2496.0000
    CPU min MHz:        633.6000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat i
                        lrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 bti
$ make
I whisper.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  aarch64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
I LDFLAGS:
I CC:       clang version 16.0.5
I CXX:      clang version 16.0.5

Any compile flags i could try?

leuc commented 1 year ago

Related https://github.com/ggerganov/llama.cpp/issues/402

Disabling sve solved it for me.

-mcpu=native+nosve
./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: type          = 1
whisper_model_load: mem required  =  201.00 MB (+    3.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.58 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:08.000]   And so my fellow Americans ask not what your country can do for you
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.

whisper_print_timings:     load time =   183.77 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   144.26 ms
whisper_print_timings:   sample time =    13.37 ms /    27 runs (    0.50 ms per run)
whisper_print_timings:   encode time =   715.86 ms /     1 runs (  715.86 ms per run)
whisper_print_timings:   decode time =    75.16 ms /    27 runs (    2.78 ms per run)
whisper_print_timings:    total time =  1169.91 ms
womesiete commented 1 year ago

Thanks so much for finding that. For anyone else, the steps to resolve this issue are as follows...

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
models/download-ggml-model.sh base.en

: Add this line to the end of CMakeLists.txt
add_compile_options(-mcpu=native+nosve)

mkdir build && cd build
cmake ..
make main  && mv bin/main ..
make quantize && mv bin/quantize ..
cd ..

./quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0
./main -f samples/jfk.wav -m models/ggml-base.en-q5_0.bin -t 8

On my S23 Ultra (12 GB) running the JFK sample with the base.en model and 8 threads finishes in 3.2-3.5 seconds. The small.en model finishes in 10.8-12.9 seconds. With 4 threads, the difference on base.en isn't noticeable, but small.en seems to lose about 0.5 seconds. I imagine that will widen with longer audio samples, but that needs more testing.

aseok commented 1 year ago

Facing error Trying to build with clblast && openblas:

cmake .. -DWHISPER_CLBLAST=ON -DWHISPER_OPENBLAS=ON

error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::ndk1::shared_ptr_pointer<_cl_command_queue*, std::__ndk1::default_delete<_cl_command_queue>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC

defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)

womesiete commented 1 year ago

Of note, the stream example generates an error when attempting to build like this. I get...

make: *** No rule to make target 'stream'. Stop.