Closed Mizstik closed 9 months ago
Sorry, can I check which model you tried this with?
A bunch of models I happened to have on the phone at the time. If I remember correctly:
mistral-7b-instruct-v0.2.Q3_K_L.gguf guanaco-3b-uncensored-v2.ggmlv1.q4_0.bin orca_mini_v3_7b.ggmlv3.q3_K_M.bin
They all worked after I reverted to v1.52.
I have the very same problem, i am running it on an android with Aarch64 on termux and the last version that is working in my phone is 1.53, i have tried a bunch of models and to download the .zip manually and building it with make but i always got the same log when i run it:
Welcome to KoboldCpp - Version 1.54 Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp_default.so
Namespace(model=None, model_param='model1-7b.Q5_K_M.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=3, blasthreads=3, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None, onready='', multiuser=0, remotetunnel=False, foreground=False, preloadstory='', quiet=False, ssl=None)
Loading model: /data/data/com.termux/files/home/koboldcpp-1.54/model1-7b.Q5_K_M.gguf [Threads: 3, BlasThreads: 3, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: llama
Identified as LLAMA model: (ver 6) Attempting to Load...
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead! System Info: AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | Illegal _instruction
I wonder if it's a problem with a Termux update. Can you still build the older version?
I wasn't updating Termux (or any package) at all when I discovered this issue. It's typical to freeze Termux updates because we don't want the F-Droid version to be overwritten by the Play Store version. I haven't run pkg update in forever either and only git pulled koboldcpp. (Though after I discovered the issue, I went around to update everything to see if any updated packages would fix the issue but no dice.)
So after all that, I tried downgrading to v1.52 and that works. The other guy above said that v1.53 also works. v1.54 and above don't work.
Also note that they all built fine. They fail at runtime, during model load.
I wonder if it's a problem with a Termux update. Can you still build the older version?
Yes 1.53 still builds and run just fine
I will take a look
I will take a look
Tell me if you need me to test anything.
Hi everyone, I just did a clean install on the latest experimental for guanaco and phi, both seem to be working. How did you setup the install? Can you try these steps:
termux-change-repo
and choose Mirror by BFSU
pkg install wget git python
(plus any other missing packages)apt install openssl
(if needed)git clone -b concedo_experimental https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
make
wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q2_K.gguf
python koboldcpp.py --model phi-2.Q2_K.gguf
http://localhost:5001
on your mobile browserTell me:
I'm wondering if someone somewhere pushed a bad package for something. Would be good to describe how you originally setup vs the above steps if there's any differences.
Unfortunately i am getting the same error.
For my original setups i tried to clone the hub and manually downloading the zip from releases but both failed to load the model.
This is the log after following your last post:
~/koboldcpp $ python koboldcpp.py --model phi-2.Q2_K.gguf
Welcome to KoboldCpp - Version 1.56 Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp_default.so
Namespace(model='phi-2.Q2_K.gguf', model_param='phi-2.Q2_K.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=3, blasthreads=3, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None, onready='', multiuser=0, remotetunnel=False, foreground=False, preloadstory='', quiet=False, ssl=None)
Loading model: /data/data/com.termux/files/home/koboldcpp/phi-2.Q2_K.gguf [Threads: 3, BlasThreads: 3, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: phi2
Identified as LLAMA model: (ver 6) Attempting to Load...
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead! System Info: AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | Illegal instruction ~/koboldcpp $
Still no go. Here's my Termux log from start to finish. I uninstalled Termux and then reinstalled again fresh from F-Droid. The log starts from changing the repo and including the compile outputs, wget, and the koboldcpp execution up until the last line which is Illegal instruction.
@LostRuins Which phone are you using? If I have the same or a similar phone, I can go try.
I was testing on a Samsung Galaxy S9 Plus
Hmm I don't have that one but I do have a phone also with SD845 (Poco F1). I'll give it a try.
Mine is a rog phone 6 with aarch64:
Android 12 Qualcomm® Snapdragon® 8+ Gen 1 Mobile Platform Qualcomm® Adreno™ 730 LPDDR5 16GB
Also ROG Phone 6 here but I have Android 13 & December security update.
Also ROG Phone 6 here but I have Android 13 & December security update.
I do not want to derrail the thread but are you into rog beta program? I have latest security update but mine didn't got android 13 even manually checking for it.
Also ROG Phone 6 here but I have Android 13 & December security update.
I do not want to derrail the thread but are you into rog beta program? I have latest security update but mine didn't got android 13 even manually checking for it.
I don't think I'm in the beta program. I don't remember applying at least. Perhaps it's a staggered deployment.
I also got Illegal instruction on the Poco F1 but curiously it got further into the loading process before failing.
~/koboldcpp $ python koboldcpp.py --model phi-2.Q2_K.gguf
***
Welcome to KoboldCpp - Version 1.56
Warning: OpenBLAS library file not found. Non-BLAS library will be used.
Initializing dynamic library: koboldcpp_default.so
==========
Namespace(model='phi-2.Q2_K.gguf', model_param='phi-2.Q2_K.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=3, blasthreads=3, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None, onready='', multiuser=0, remotetunnel=False, foreground=False, preloadstory='', quiet=False, ssl=None)
==========
Loading model: /data/data/com.termux/files/home/koboldcpp/phi-2.Q2_K.gguf
[Threads: 3, BlasThreads: 3, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: phi2
---
Identified as LLAMA model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /data/data/com.termux/files/home/koboldcpp/phi-2.Q2_K.gguf (version GGUF V3 (latest))
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = phi2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 51200
llm_load_print_meta: n_merges = 50000
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2560
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 32
llm_load_print_meta: n_embd_head_k = 80
llm_load_print_meta: n_embd_head_v = 80
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 2560
llm_load_print_meta: n_embd_v_gqa = 2560
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 10240
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 3B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 2.78 B
llm_load_print_meta: model size = 1.09 GiB (3.37 BPW)
llm_load_print_meta: general.name = Phi2
llm_load_print_meta: BOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token = 50256 '<|endoftext|>'
llm_load_print_meta: UNK token = 50256 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.12 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/33 layers to GPU
llm_load_tensors: CPU buffer size = 1117.52 MiB
..........................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx = 2128
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 665.00 MiB
llama_new_context_with_model: KV self size = 665.00 MiB, K (f16): 332.50 MiB, V (f16): 332.50 MiB
llama_new_context_with_model: graph splits (measure): 1
llama_new_context_with_model: CPU compute buffer size = 177.16 MiB
Illegal instruction
~/koboldcpp $
Once again I tried reverting to v1.53 and now it works:
[-Wunused-command-line-argument] Your OS does not appear to be Windows. For faster speeds, install and link a BLAS library. Set LLAMA_OPENBLAS=1 to compile with OpenBLAS support or LLAMA_CLBLAST=1 to compile with ClBlast support. This is just a reminder, not an error.~/koboldcpp $ python koboldcpp.py --model phi-2.Q2_K.gguf
***
Welcome to KoboldCpp - Version 1.53
Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp_default.so
==========
Namespace(model='phi-2.Q2_K.gguf', model_param='phi-2.Q2_K.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=3, blasthreads=3, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None, onready='', multiuser=0, remotetunnel=False, foreground=False, preloadstory='', quiet=False, ssl=None)
==========
Loading model: /data/data/com.termux/files/home/koboldcpp/phi-2.Q2_K.gguf [Threads: 3, BlasThreads: 3, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: phi2
---
Identified as LLAMA model: (ver 6)
Attempting to Load... ---
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /data/data/com.termux/files/home/koboldcpp/phi-2.Q2_K.gguf (version GGUF V3 (latest))
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ). llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = phi2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 51200 llm_load_print_meta: n_merges = 50000
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2560
llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 32
llm_load_print_meta: n_gqa = 1 llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 10240
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 3B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 2.78 B llm_load_print_meta: model size = 1.09 GiB (3.37 BPW)
llm_load_print_meta: general.name = Phi2
llm_load_print_meta: BOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token = 50256 '<|endoftext|>' llm_load_print_meta: UNK token = 50256 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä' llm_load_tensors: ggml ctx size = 0.12 MiB llm_load_tensors: system memory used = 1117.64 MiB
.......................................................................................... Automatic RoPE Scaling: Using (scale:1.000, base:10000.0). llama_new_context_with_model: n_ctx = 2128
llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 665.00 MiB, K (f16): 332.50 MiB, V (f16): 332.50 MiB llama_build_graph: non-view tensors processed: 774/774
llama_new_context_with_model: compute buffer total size = 170.35 MiB Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
====== Please connect to custom endpoint at http://localhost:5001
Okay I noticed one interesting change - the value detected for FP16_VA
was previously "0" when it worked, and seems like its "1" when it fails for you. @Dravoss do you observe the same situation also? (Trying older working version shows your FP16_VA=0 and newer crashing version shows FP16_VA=1)?
I wanna know which version did it break, and did the FP16_VA change happen at the same version too.
Edit: I switched to a newer device and have managed to repro the same issue.
I wanna know which version did it break, and did the FP16_VA change happen at the same version too.
Edit: I switched to a newer device and have managed to repro the same issue.
FP16_VA = 1 for both 1.54 and experimental versions and FP16_VA = 0 in working 1.53 ver.
So 1.53 works, and 1.54 doesn't?
So 1.53 works, and 1.54 doesn't?
Yes
Thanks. I am slowly trying to find the offending commit.
Okay I think I found the issue. Can you try to pull the latest commit from my concedo_experimental
branch? It is working on my new device now.
Okay I think I found the issue. Can you try to pull the latest commit from my
concedo_experimental
branch? It is working on my new device now.
It seems to work now, thank you for your hard work, i will try other models just to make sure and if anytime you need help testing koboldcpp on android let me know.
Confirmed working from here as well. Thank you!
A bit of a side note here, but digging into the rabbit hole over at https://github.com/ggerganov/llama.cpp/issues/402 and some experimentation, I found that
CFLAGS += -march=armv9-a+nosve
CXXFLAGS += -march=armv9-a+nosve
works for SD8 Gen 1 with FP16_VA = 1. Performance compared to generic is about +28% on phi-2. (~6.39 T/s vs. ~5 T/s)
In v1.54 and in latest, although it compiles successfully (no error), koboldcpp fails at the step when it's about to load the model (after having shown System Info) with "Illegal Instruction" and then exit to shell.
I did a git checkout v1.52, make clean, then make. And with this, it still runs fine. I didn't try 1.53, sorry.
Something happened after 1.52 that broke Android compatibility (aarch64) at least in Termux, and possibly other ARM-based SBCs. My particular phone is ROG Phone 6. (SD8 Gen 1, 16 GB RAM)