Open womesiete opened 1 year ago
Having the same issue, it seems to fail at calling cntw
(gdb) bt
#0 0x00000063ab4b8954 in ggml_new_tensor_impl ()
#1 0x00000063ab4b8cd0 in ggml_new_tensor_2d ()
#2 0x00000063ab4db824 in whisper_model_load(whisper_model_loader*, whisper_context&) ()
#3 0x00000063ab4da3fc in whisper_init_no_state ()
#4 0x00000063ab4da25c in whisper_init_from_file_no_state ()
#5 0x00000063ab4e02d0 in whisper_init_from_file ()
#6 0x00000063ab4925d8 in main ()
│ > 0x63ab4b8954 <ggml_new_tensor_impl+308> cntw x14 │
│ 0x63ab4b8958 <ggml_new_tensor_impl+312> mov w13, w2 │
│ 0x63ab4b895c <ggml_new_tensor_impl+316> mov w15, #0xc // #12 │
lscpu
$ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: ARM
Model name: Cortex-A510
Model: 2
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r0p2
CPU(s) scaling MHz: 62%
CPU max MHz: 1785.6000
CPU min MHz: 307.2000
BogoMIPS: 38.40
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat i
lrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 bti
Model name: Cortex-A710
Model: 0
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 1
Stepping: r2p0
CPU(s) scaling MHz: 100%
CPU max MHz: 2496.0000
CPU min MHz: 633.6000
BogoMIPS: 38.40
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat i
lrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 bti
$ make
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: aarch64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
I LDFLAGS:
I CC: clang version 16.0.5
I CXX: clang version 16.0.5
Any compile flags i could try?
Related https://github.com/ggerganov/llama.cpp/issues/402
Disabling sve solved it for me.
-mcpu=native+nosve
./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: type = 1
whisper_model_load: mem required = 201.00 MB (+ 3.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 73.58 MB
whisper_model_load: model size = 73.54 MB
whisper_init_state: kv self size = 2.62 MB
whisper_init_state: kv cross size = 8.79 MB
system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 |
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:08.000] And so my fellow Americans ask not what your country can do for you
[00:00:08.000 --> 00:00:11.000] ask what you can do for your country.
whisper_print_timings: load time = 183.77 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 144.26 ms
whisper_print_timings: sample time = 13.37 ms / 27 runs ( 0.50 ms per run)
whisper_print_timings: encode time = 715.86 ms / 1 runs ( 715.86 ms per run)
whisper_print_timings: decode time = 75.16 ms / 27 runs ( 2.78 ms per run)
whisper_print_timings: total time = 1169.91 ms
Thanks so much for finding that. For anyone else, the steps to resolve this issue are as follows...
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
models/download-ggml-model.sh base.en
: Add this line to the end of CMakeLists.txt
add_compile_options(-mcpu=native+nosve)
mkdir build && cd build
cmake ..
make main && mv bin/main ..
make quantize && mv bin/quantize ..
cd ..
./quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0
./main -f samples/jfk.wav -m models/ggml-base.en-q5_0.bin -t 8
On my S23 Ultra (12 GB) running the JFK sample with the base.en model and 8 threads finishes in 3.2-3.5 seconds. The small.en model finishes in 10.8-12.9 seconds. With 4 threads, the difference on base.en isn't noticeable, but small.en seems to lose about 0.5 seconds. I imagine that will widen with longer audio samples, but that needs more testing.
Facing error Trying to build with clblast && openblas:
cmake .. -DWHISPER_CLBLAST=ON -DWHISPER_OPENBLAS=ON
error: relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol 'vtable for std::ndk1::shared_ptr_pointer<_cl_command_queue*, std::__ndk1::default_delete<_cl_command_queue>, std::__ndk1::allocator<_cl_command_queue*>>'; recompile with -fPIC
defined in /data/data/com.termux/files/usr/lib/libclblast.a(clblast.cpp.o)
Of note, the stream example generates an error when attempting to build like this. I get...
make: *** No rule to make target 'stream'. Stop.
I built whisper.cpp from source on a Galaxy S23 ultra (Android 13) in Termux, and when I run the following command...
whisper.cpp/main -f whisper.cpp/samples/jfk.wav -m whisper.cpp/models/ggml-base.en.bin -t 4
... it fails with the output below...
I also get illegal instruction when attempting to generate quantized models.
The exact same process worked as expected on my old Galaxy S8 (Android 9.0). Any idea what might cause the difference?
The commands that I used to build were...