ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.29k stars 9.35k forks source link

Bug: Unreadable output from android example project #9555

Open xunuohope1107 opened 3 days ago

xunuohope1107 commented 3 days ago

What happened?

I have built and run the android example project under examples/llama.android, but found the output from the android UI is very hard to understand. I try the a lot of prompt like "hello", "why sky is blue?" on several real devices as well as virtual devices. The output is not a sentence but a random combination of words or programming code.

Name and Version

b3785, android arm64-v8a

What operating system are you seeing the problem on?

Android arm64-v8a

Relevant log output

Here is the log from my console (use "hello" as prompt):
`2024-09-20 09:28:35.402 15885-5371  llama-android.cpp       com.example.llama                    I  n_len = 64, n_ctx = 2048, n_kv_req = 64
2024-09-20 09:28:35.402 15885-5371  llama-android.cpp       com.example.llama                    I  hello
2024-09-20 09:28:35.402 15885-5371  llama-android.cpp       com.example.llama                    I   
2024-09-20 09:28:35.402 15885-15885 ViewRootImplExtImpl     com.example.llama                    D  MotionEvent MotionEvent { action=ACTION_UP, actionButton=0, id[0]=0, x[0]=121.26953, y[0]=1821.7549, toolType[0]=TOOL_TYPE_FINGER, buttonState=0, classification=NONE, metaState=0, flags=0x0, edgeFlags=0x0, pointerCount=1, historySize=0, eventTime=101595713, downTime=101595640, deviceId=6, source=0x1002, displayId=0, eventId=224023115 } handled by client, just return
2024-09-20 09:28:35.441 15885-15885 Quality                 com.example.llama                    I  Skipped: false 3 cost 30.282692 refreshRate 8290371 bit true processName com.example.llama
2024-09-20 09:28:35.834 15885-5371  llama-android.cpp       com.example.llama                    I  cached: llo, new_token_chars: `llo`, id: 18798
2024-09-20 09:28:36.017 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  world, new_token_chars: ` world`, id: 995
2024-09-20 09:28:36.172 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ", new_token_chars: `"`, id: 1
2024-09-20 09:28:36.296 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:36.432 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:36.573 15885-5371  llama-android.cpp       com.example.llama                    I  cached: def, new_token_chars: `def`, id: 4299
2024-09-20 09:28:36.718 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  remove, new_token_chars: ` remove`, id: 4781
2024-09-20 09:28:36.849 15885-5371  llama-android.cpp       com.example.llama                    I  cached: _, new_token_chars: `_`, id: 62
2024-09-20 09:28:36.982 15885-5371  llama-android.cpp       com.example.llama                    I  cached: v, new_token_chars: `v`, id: 85
2024-09-20 09:28:37.103 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ow, new_token_chars: `ow`, id: 322
2024-09-20 09:28:37.229 15885-5371  llama-android.cpp       com.example.llama                    I  cached: els, new_token_chars: `els`, id: 1424
2024-09-20 09:28:37.348 15885-5371  llama-android.cpp       com.example.llama                    I  cached: (, new_token_chars: `(`, id: 7
2024-09-20 09:28:37.477 15885-5371  llama-android.cpp       com.example.llama                    I  cached: s, new_token_chars: `s`, id: 82
2024-09-20 09:28:37.598 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ):, new_token_chars: `):`, id: 2599
2024-09-20 09:28:37.726 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:37.870 15885-5371  llama-android.cpp       com.example.llama                    I  cached:     , new_token_chars: `    `, id: 50284
2024-09-20 09:28:37.991 15885-5371  llama-android.cpp       com.example.llama                    I  cached: v, new_token_chars: `v`, id: 85
2024-09-20 09:28:38.120 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ow, new_token_chars: `ow`, id: 322
2024-09-20 09:28:38.248 15885-5371  llama-android.cpp       com.example.llama                    I  cached: els, new_token_chars: `els`, id: 1424
2024-09-20 09:28:38.381 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  =, new_token_chars: ` =`, id: 796
2024-09-20 09:28:38.512 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  ", new_token_chars: ` "`, id: 366
2024-09-20 09:28:38.636 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ae, new_token_chars: `ae`, id: 3609
2024-09-20 09:28:38.892 15885-5371  llama-android.cpp       com.example.llama                    I  cached: i, new_token_chars: `i`, id: 72
2024-09-20 09:28:39.029 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ou, new_token_chars: `ou`, id: 280
2024-09-20 09:28:39.141 15885-5371  llama-android.cpp       com.example.llama                    I  cached: AE, new_token_chars: `AE`, id: 14242
2024-09-20 09:28:39.276 15885-5371  llama-android.cpp       com.example.llama                    I  cached: I, new_token_chars: `I`, id: 40
2024-09-20 09:28:39.393 15885-5371  llama-android.cpp       com.example.llama                    I  cached: OU, new_token_chars: `OU`, id: 2606
2024-09-20 09:28:39.535 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ", new_token_chars: `"`, id: 1
2024-09-20 09:28:39.685 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:39.815 15885-5371  llama-android.cpp       com.example.llama                    I  cached:     , new_token_chars: `    `, id: 50284
2024-09-20 09:28:39.950 15885-5371  llama-android.cpp       com.example.llama                    I  cached: return, new_token_chars: `return`, id: 7783
2024-09-20 09:28:40.083 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  ', new_token_chars: ` '`, id: 705
2024-09-20 09:28:40.227 15885-5371  llama-android.cpp       com.example.llama                    I  cached: '., new_token_chars: `'.`, id: 4458
2024-09-20 09:28:40.357 15885-5371  llama-android.cpp       com.example.llama                    I  cached: join, new_token_chars: `join`, id: 22179
2024-09-20 09:28:40.496 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ([, new_token_chars: `([`, id: 26933
2024-09-20 09:28:40.611 15885-5371  llama-android.cpp       com.example.llama                    I  cached: c, new_token_chars: `c`, id: 66
2024-09-20 09:28:40.734 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  for, new_token_chars: ` for`, id: 329
2024-09-20 09:28:40.853 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  c, new_token_chars: ` c`, id: 269
2024-09-20 09:28:40.984 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  in, new_token_chars: ` in`, id: 287
2024-09-20 09:28:41.271 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  s, new_token_chars: ` s`, id: 264
2024-09-20 09:28:41.465 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  if, new_token_chars: ` if`, id: 611
2024-09-20 09:28:41.604 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  c, new_token_chars: ` c`, id: 269
2024-09-20 09:28:41.788 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  not, new_token_chars: ` not`, id: 407
2024-09-20 09:28:41.913 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  in, new_token_chars: ` in`, id: 287
2024-09-20 09:28:42.050 15885-5371  llama-android.cpp       com.example.llama                    I  cached:  vow, new_token_chars: ` vow`, id: 23268
2024-09-20 09:28:42.178 15885-5371  llama-android.cpp       com.example.llama                    I  cached: els, new_token_chars: `els`, id: 1424
2024-09-20 09:28:42.308 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ]), new_token_chars: `])`, id: 12962
2024-09-20 09:28:42.435 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:42.563 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:42.695 15885-5371  llama-android.cpp       com.example.llama                    I  cached: print, new_token_chars: `print`, id: 4798
2024-09-20 09:28:42.828 15885-5371  llama-android.cpp       com.example.llama                    I  cached: (, new_token_chars: `(`, id: 7
2024-09-20 09:28:42.964 15885-5371  llama-android.cpp       com.example.llama                    I  cached: remove, new_token_chars: `remove`, id: 28956
2024-09-20 09:28:43.116 15885-5371  llama-android.cpp       com.example.llama                    I  cached: _, new_token_chars: `_`, id: 62
2024-09-20 09:28:43.249 15885-5371  llama-android.cpp       com.example.llama                    I  cached: v, new_token_chars: `v`, id: 85
2024-09-20 09:28:43.390 15885-5371  llama-android.cpp       com.example.llama                    I  cached: ow, new_token_chars: `ow`, id: 322
2024-09-20 09:28:43.511 15885-5371  llama-android.cpp       com.example.llama                    I  cached: els, new_token_chars: `els`, id: 1424
2024-09-20 09:28:43.642 15885-5371  llama-android.cpp       com.example.llama                    I  cached: (, new_token_chars: `(`, id: 7
2024-09-20 09:28:43.776 15885-5371  llama-android.cpp       com.example.llama                    I  cached: s, new_token_chars: `s`, id: 82
2024-09-20 09:28:43.922 15885-5371  llama-android.cpp       com.example.llama                    I  cached: )), new_token_chars: `))`, id: 4008
2024-09-20 09:28:44.098 15885-5371  llama-android.cpp       com.example.llama                    I  cached: 
                                                                                                    , new_token_chars: `
                                                                                                    `, id: 198
2024-09-20 09:28:44.282 15885-5371  llama

Screenshot:

Screenshot 2024-09-20 at 09 37 20
Flyfish233 commented 2 days ago

Try to load other models. Not happening on b3787, qwen2-1_5b-instruct-q5_0.gguf

xunuohope1107 commented 2 days ago

Try to load other models. Not happening on b3787, qwen2-1_5b-instruct-q5_0.gguf

I tried with qwen2-1_5b-instruct-q5_0.gguf on b3788 (llama.android), but still got unreasonable output. I tested on Xiaomi 14(16G RAM), Oneplus 12R (16G RAM) and Pixel 4a, here is the log from console:

2024-09-20 17:56:05.691 13226-13226 ExtensionsLoader com.example.llama D Opened libSchedAssistExtImpl.so 2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I QUALCOMM build : 1a285a84ae, I2991b7e11e Build Date : 06/04/23 OpenGL ES Shader Compiler Version: E031.41.03.36 Local Branch : Remote Branch : Remote Branch : Reconstruct Branch : 2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I Build Config : S P 14.1.4 AArch64 2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I Driver Path : /vendor/lib64/egl/libGLESv2_adreno.so 2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I Driver Version : 0676.32 2024-09-20 17:56:05.695 13226-14930 AdrenoGLES-0 com.example.llama I PFP: 0x01740158, ME: 0x00000000 2024-09-20 17:56:05.699 13226-14930 AdrenoUtils com.example.llama I : Reading chip ID through GSL 2024-09-20 17:56:05.703 13226-14930 BufferQueueProducer com.example.llama D VRI[MainActivity]#0(BLAST Consumer)0 connect: api=1 producerControlledByApp=true 2024-09-20 17:56:05.703 13226-14930 OpenGLRenderer com.example.llama E Unable to match the desired swap behavior. 2024-09-20 17:56:05.705 13226-13226 SurfaceControl com.example.llama I setExtendedRangeBrightness sc=Surface(name=com.example.llama/com.example.llama.MainActivity)/@0x764580f,currentBufferRatio=1.0,desiredRatio=1.0 2024-09-20 17:56:05.716 13226-14930 BLASTBufferQueue com.example.llama D VRI[MainActivity]#0 acquireNextBufferLocked size=1080x2376 mFrameNumber=1 applyTransaction=true mTimestamp=132046035059522(auto) mPendingTransactions.size=0 graphicBufferId=56805237456900 transform=0 2024-09-20 17:56:05.716 13226-14930 VRI[MainActivity] com.example.llama D Received frameCommittedCallback lastAttemptedDrawFrameNum=1 didProduceBuffer=true syncBuffer=false 2024-09-20 17:56:05.716 13226-13226 VRI[MainActivity] com.example.llama D draw finished. 2024-09-20 17:56:05.716 13226-13226 VRI[MainActivity] com.example.llama D reportDrawFinished 2024-09-20 17:56:05.716 13226-13226 ViewRootImplExtImpl com.example.llama D setMaxDequeuedBufferCount: 2 2024-09-20 17:56:05.724 13226-13226 VRI[MainActivity] com.example.llama D onFocusEvent true 2024-09-20 17:56:05.724 13226-13226 Choreographer com.example.llama I Skipped 30 frames! The application may be doing too much work on its main thread. 2024-09-20 17:56:05.724 13226-13226 Quality com.example.llama I Skipped: false 30 cost 250.18631 refreshRate 8289370 bit true processName com.example.llama 2024-09-20 17:56:05.740 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.331621 refreshRate 8289194 bit true processName com.example.llama 2024-09-20 17:56:06.156 13226-13226 Quality com.example.llama I Skipped: false 1 cost 10.590919 refreshRate 8289053 bit true processName com.example.llama 2024-09-20 17:56:06.742 13226-14945 OplusScrollToTopManager com.example.llama D com.example.llama/com.example.llama.MainActivity,This DecorView@e676581[MainActivity] change focus to true 2024-09-20 17:56:08.669 13226-13226 AutofillManager com.example.llama V requestHideFillUi(null): anchor = null 2024-09-20 17:56:08.737 13226-13226 OplusInput...erInternal com.example.llama D get inputMethodManager extension: com.android.internal.view.IInputMethodManager$Stub$Proxy@89407a1 2024-09-20 17:56:08.741 13226-14967 LLamaAndroid com.example.llama D Dedicated thread for native code: Llm-RunLoop 2024-09-20 17:56:08.742 13226-13226 ViewRootImplExtImpl com.example.llama D MotionEvent MotionEvent { action=ACTION_UP, actionButton=0, id[0]=0, x[0]=491.54004, y[0]=2261.8496, toolType[0]=TOOL_TYPE_FINGER, buttonState=0, classification=NONE, metaState=0, flags=0x0, edgeFlags=0x0, pointerCount=1, historySize=0, eventTime=132049051, downTime=132048977, deviceId=6, source=0x1002, displayId=0, eventId=640003685 } handled by client, just return 2024-09-20 17:56:08.747 13226-14967 LLamaAndroid com.example.llama D AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 2024-09-20 17:56:08.748 13226-14967 llama-android.cpp com.example.llama I Loading model from /storage/emulated/0/Android/data/com.example.llama/files/qwen2-1_5b-instruct-q5_0.gguf 2024-09-20 17:56:08.753 13226-13226 Quality com.example.llama I Skipped: true 1 cost 12.096126 refreshRate 8333333 bit true processName com.example.llama 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: loaded meta data with 26 key-value pairs and 338 tensors from /storage/emulated/0/Android/data/com.example.llama/files/qwen2-1_5b-instruct-q5_0.gguf (version GGUF V3 (latest)) 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 0: general.architecture str = qwen2 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 1: general.name str = qwen2-1_5b-instruct 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 2: qwen2.block_count u32 = 28 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 3: qwen2.context_length u32 = 32768 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 4: qwen2.embedding_length u32 = 1536 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 8960 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 12 2024-09-20 17:56:08.779 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 2 2024-09-20 17:56:08.780 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000 2024-09-20 17:56:08.780 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 2024-09-20 17:56:08.780 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 10: general.file_type u32 = 8 2024-09-20 17:56:08.780 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2 2024-09-20 17:56:08.780 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2 2024-09-20 17:56:08.804 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "", "&", "'", ... 2024-09-20 17:56:08.809 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151645 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151643 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 19: tokenizer.chat_template str = { 45358000342027376826582226793079494978536804014395455830443385929404386720687094409997545952899419186858395500209268036995739998909139943970409346798798085432955252350389385345239973876717475932972674074899806483906560.000000or message in messages }{ 0f lo... 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = false 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 21: general.quantization_version u32 = 2 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 22: quantize.imatrix.file str = ../Qwen2/gguf/qwen2-1_5b-imatrix/imat... 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 23: quantize.imatrix.dataset str = ../sft_2406.txt 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 24: quantize.imatrix.entries_count i32 = 196 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - kv 25: quantize.imatrix.chunks_count i32 = 1937 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - type f32: 141 tensors 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - type q5_0: 193 tensors 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - type q5_1: 3 tensors 2024-09-20 17:56:08.826 13226-14967 llama-android.cpp com.example.llama I llama_model_loader: - type q6_K: 1 tensors 2024-09-20 17:56:08.936 13226-14967 llama-android.cpp com.example.llama I llm_load_vocab: special tokens cache size = 293 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_vocab: token to piece cache size = 0.9338 MB 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: format = GGUF V3 (latest) 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: arch = qwen2 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: vocab type = BPE 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_vocab = 151936 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_merges = 151387 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: vocab_only = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_ctx_train = 32768 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_embd = 1536 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_layer = 28 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_head = 12 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_head_kv = 2 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_rot = 128 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_swa = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_embd_head_k = 128 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_embd_head_v = 128 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_gqa = 6 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_embd_k_gqa = 256 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_embd_v_gqa = 256 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: f_norm_eps = 0.0e+00 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: f_norm_rms_eps = 1.0e-06 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: f_clamp_kqv = 0.0e+00 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: f_max_alibi_bias = 0.0e+00 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: f_logit_scale = 0.0e+00 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_ff = 8960 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_expert = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_expert_used = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: causal attn = 1 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: pooling type = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: rope type = 2 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: rope scaling = linear 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: freq_base_train = 1000000.0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: freq_scale_train = 1 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: n_ctx_orig_yarn = 32768 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: rope_finetuned = unknown 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: ssm_d_conv = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: ssm_d_inner = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: ssm_d_state = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: ssm_dt_rank = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: ssm_dt_b_c_rms = 0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: model type = ?B 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: model ftype = Q5_0 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: model params = 1.54 B 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: model size = 1.02 GiB (5.68 BPW) 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: general.name = qwen2-1_5b-instruct 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: BOS token = 151643 '<|endoftext|>' 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: EOS token = 151645 '<|im_end|>' 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: PAD token = 151643 '<|endoftext|>' 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: LF token = 148848 'ÄĬ' 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: EOT token = 151645 '<|im_end|>' 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_print_meta: max token length = 256 2024-09-20 17:56:08.975 13226-14967 llama-android.cpp com.example.llama I llm_load_tensors: ggml ctx size = 0.15 MiB 2024-09-20 17:56:09.133 13226-14967 llama-android.cpp com.example.llama I llm_load_tensors: CPU buffer size = 1044.62 MiB 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I Using 6 threads 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: n_ctx = 2048 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: n_batch = 2048 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: n_ubatch = 512 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: flash_attn = 0 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: freq_base = 1000000.0 2024-09-20 17:56:09.143 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: freq_scale = 1 2024-09-20 17:56:09.152 13226-14967 llama-android.cpp com.example.llama I llama_kv_cache_init: CPU KV buffer size = 56.00 MiB 2024-09-20 17:56:09.152 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: KV self size = 56.00 MiB, K (f16): 28.00 MiB, V (f16): 28.00 MiB 2024-09-20 17:56:09.152 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: CPU output buffer size = 0.58 MiB 2024-09-20 17:56:09.153 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: CPU compute buffer size = 299.75 MiB 2024-09-20 17:56:09.153 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: graph nodes = 986 2024-09-20 17:56:09.153 13226-14967 llama-android.cpp com.example.llama I llama_new_context_with_model: graph splits = 1 2024-09-20 17:56:09.153 13226-14967 LLamaAndroid com.example.llama I Loaded model /storage/emulated/0/Android/data/com.example.llama/files/qwen2-1_5b-instruct-q5_0.gguf 2024-09-20 17:56:09.175 13226-13226 Quality com.example.llama I Skipped: false 1 cost 11.889272 refreshRate 8288722 bit true processName com.example.llama 2024-09-20 17:56:10.171 13226-13226 AutofillManager com.example.llama V requestHideFillUi(null): anchor = null 2024-09-20 17:56:10.264 13226-13226 Compose Focus com.example.llama D Owner FocusChanged(true) 2024-09-20 17:56:10.270 13226-13226 ViewRootImplExtImpl com.example.llama D MotionEvent MotionEvent { action=ACTION_UP, actionButton=0, id[0]=0, x[0]=528.70605, y[0]=1516.375, toolType[0]=TOOL_TYPE_FINGER, buttonState=0, classification=NONE, metaState=0, flags=0x0, edgeFlags=0x0, pointerCount=1, historySize=0, eventTime=132050570, downTime=132050480, deviceId=6, source=0x1002, displayId=0, eventId=574615136 } handled by client, just return 2024-09-20 17:56:10.278 13226-13226 Quality com.example.llama I Skipped: false 2 cost 21.315496 refreshRate 8287714 bit true processName com.example.llama 2024-09-20 17:56:10.322 13226-13226 ImeTracker com.example.llama I com.example.llama:24cce9e6: onRequestShow at ORIGIN_CLIENT_SHOW_SOFT_INPUT reason SHOW_SOFT_INPUT_BY_INSETS_API 2024-09-20 17:56:10.322 13226-13226 InsetsController com.example.llama D show(ime(), fromIme=false) 2024-09-20 17:56:10.323 13226-13226 InputMethodManager com.example.llama D showSoftInput() view=androidx.compose.ui.platform.AndroidComposeView{af1f4d6 VFED..... .F....ID 0,0-1080,2208 aid=1073741824 viewInfo = } flags=0 reason=SHOW_SOFT_INPUT_BY_INSETS_API 2024-09-20 17:56:10.341 13226-13226 Quality com.example.llama I Skipped: true 3 cost 26.484884 refreshRate 8287714 bit true processName com.example.llama 2024-09-20 17:56:10.372 13226-13226 Quality com.example.llama I Skipped: false 1 cost 16.270636 refreshRate 8287714 bit true processName com.example.llama 2024-09-20 17:56:10.385 13226-13226 FinalizerDaemon com.example.llama W type=1400 audit(0.0:2290): avc: denied { getopt } for path="/dev/socket/usap_pool_primary" scontext=u:r:untrusted_app:s0:c88,c257,c512,c768 tcontext=u:r:zygote:s0 tclass=unix_stream_socket permissive=0 app=com.example.llama 2024-09-20 17:56:10.391 13226-14917 StrictMode com.example.llama D StrictMode policy violation: android.os.strictmode.LeakedClosableViolation: A resource was acquired at attached stack trace but never released. See java.io.Closeable for information on avoiding resource leaks. Callsite: InsetsSourceControl at android.os.StrictMode$AndroidCloseGuardReporter.report(StrictMode.java:2097) at dalvik.system.CloseGuard.warnIfOpen(CloseGuard.java:338) at android.view.SurfaceControl.finalize(SurfaceControl.java:1576) at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:339) at java.lang.Daemons$FinalizerDaemon.processReference(Daemons.java:324) at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:300) at java.lang.Daemons$Daemon.run(Daemons.java:145) at java.lang.Thread.run(Thread.java:1012) 2024-09-20 17:56:10.394 13226-13226 RecordingIC com.example.llama W requestCursorUpdates is not supported 2024-09-20 17:56:10.396 13226-13226 Quality com.example.llama I Skipped: false 1 cost 14.975071 refreshRate 8287506 bit true processName com.example.llama 2024-09-20 17:56:10.412 13226-13226 Quality com.example.llama I Skipped: false 1 cost 14.3007345 refreshRate 8287526 bit true processName com.example.llama 2024-09-20 17:56:10.427 13226-13226 VRI[MainActivity] com.example.llama W handleResized abandoned! 2024-09-20 17:56:10.430 13226-13226 VRI[MainActivity] com.example.llama W handleResized abandoned! 2024-09-20 17:56:10.430 13226-13226 Quality com.example.llama I Skipped: false 1 cost 15.811308 refreshRate 8287533 bit true processName com.example.llama 2024-09-20 17:56:10.443 13226-13226 InsetsController com.example.llama D show(ime(), fromIme=true) 2024-09-20 17:56:10.444 13226-14986 OplusWindowManager com.example.llama D get WMS extension: android.os.BinderProxy@1ae8041 2024-09-20 17:56:10.445 13226-13226 Quality com.example.llama I Skipped: false 1 cost 14.625874 refreshRate 8287530 bit true processName com.example.llama 2024-09-20 17:56:10.462 13226-13226 Quality com.example.llama I Skipped: false 1 cost 14.671543 refreshRate 8287538 bit true processName com.example.llama 2024-09-20 17:56:10.474 13226-13226 VRI[MainActivity] com.example.llama W handleResized abandoned! 2024-09-20 17:56:10.476 13226-13226 Quality com.example.llama I Skipped: false 1 cost 11.987819 refreshRate 8287549 bit true processName com.example.llama 2024-09-20 17:56:10.491 13226-13226 Quality com.example.llama I Skipped: false 1 cost 10.011921 refreshRate 8287574 bit true processName com.example.llama 2024-09-20 17:56:10.521 13226-13226 Quality com.example.llama I Skipped: false 1 cost 15.110632 refreshRate 8287625 bit true processName com.example.llama 2024-09-20 17:56:10.683 13226-13226 ImeTracker com.example.llama I com.example.llama:24cce9e6: onShown 2024-09-20 17:56:10.814 13226-14990 ProfileInstaller com.example.llama D Installing profile for com.example.llama 2024-09-20 17:56:11.247 13226-13226 Quality com.example.llama I Skipped: false 7 cost 62.15717 refreshRate 8288450 bit true processName com.example.llama 2024-09-20 17:56:11.788 13226-13226 Quality com.example.llama I Skipped: false 5 cost 47.37528 refreshRate 8288787 bit true processName com.example.llama 2024-09-20 17:56:12.601 13226-13226 RecordingIC com.example.llama W requestCursorUpdates is not supported 2024-09-20 17:56:12.633 13226-13226 WindowOnBackDispatcher com.example.llama W sendCancelIfRunning: isInProgress=falsecallback=ImeCallback=ImeOnBackInvokedCallback@140997194 Callback=android.window.IOnBackInvokedCallback$Stub$Proxy@dff351e 2024-09-20 17:56:12.640 13226-13226 VRI[MainActivity] com.example.llama W handleResized abandoned! 2024-09-20 17:56:12.826 13226-13226 ImeTracker com.example.llama I com.example.llama:3ecdc502: onRequestHide at ORIGIN_CLIENT_HIDE_SOFT_INPUT reason HIDE_SOFT_INPUT_BY_INSETS_API 2024-09-20 17:56:12.830 13226-13226 ImeTracker com.example.llama I com.example.llama:619f1067: onHidden 2024-09-20 17:56:12.857 13226-13226 VRI[MainActivity] com.example.llama W handleResized abandoned! 2024-09-20 17:56:12.859 13226-13226 VRI[MainActivity] com.example.llama W handleResized abandoned! 2024-09-20 17:56:13.408 13226-13226 AutofillManager com.example.llama V requestHideFillUi(null): anchor = null 2024-09-20 17:56:13.419 13226-13226 Quality com.example.llama I Skipped: false 1 cost 12.059912 refreshRate 8288989 bit true processName com.example.llama 2024-09-20 17:56:13.476 13226-13226 ViewRootImplExtImpl com.example.llama D MotionEvent MotionEvent { action=ACTION_UP, actionButton=0, id[0]=0, x[0]=137.28906, y[0]=1685.7041, toolType[0]=TOOL_TYPE_FINGER, buttonState=0, classification=NONE, metaState=0, flags=0x0, edgeFlags=0x0, pointerCount=1, historySize=0, eventTime=132053786, downTime=132053720, deviceId=6, source=0x1002, displayId=0, eventId=492846711 } handled by client, just return 2024-09-20 17:56:13.476 13226-14967 llama-android.cpp com.example.llama I n_len = 64, n_ctx = 2048, n_kv_req = 64 2024-09-20 17:56:13.476 13226-14967 llama-android.cpp com.example.llama I hello 2024-09-20 17:56:13.476 13226-14967 llama-android.cpp com.example.llama I
2024-09-20 17:56:13.539 13226-13226 Quality com.example.llama I Skipped: false 6 cost 57.353714 refreshRate 8333333 bit true processName com.example.llama 2024-09-20 17:56:13.645 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:13.736 13226-14967 llama-android.cpp com.example.llama I cached: ., new_token_chars: ., id: 13 2024-09-20 17:56:13.759 13226-13226 Quality com.example.llama I Skipped: false 1 cost 12.094041 refreshRate 8289116 bit true processName com.example.llama 2024-09-20 17:56:13.833 13226-14967 llama-android.cpp com.example.llama I cached: Let, new_token_chars: Let, id: 6771 2024-09-20 17:56:13.858 13226-13226 Quality com.example.llama I Skipped: false 1 cost 11.876371 refreshRate 8289015 bit true processName com.example.llama 2024-09-20 17:56:13.953 13226-14967 llama-android.cpp com.example.llama I cached: $, new_token_chars: $, id: 400 2024-09-20 17:56:14.044 13226-14967 llama-android.cpp com.example.llama I cached: a, new_token_chars: a, id: 64 2024-09-20 17:56:14.064 13226-13226 Quality com.example.llama I Skipped: false 1 cost 10.482091 refreshRate 8289267 bit true processName com.example.llama 2024-09-20 17:56:14.132 13226-14967 llama-android.cpp com.example.llama I cached: ,b, new_token_chars: ,b, id: 8402 2024-09-20 17:56:14.216 13226-14967 llama-android.cpp com.example.llama I cached: ,c, new_token_chars: ,c, id: 10109 2024-09-20 17:56:14.310 13226-14967 llama-android.cpp com.example.llama I cached: $, new_token_chars: $, id: 3 2024-09-20 17:56:14.327 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.775117 refreshRate 8288756 bit true processName com.example.llama 2024-09-20 17:56:14.403 13226-14967 llama-android.cpp com.example.llama I cached: be, new_token_chars: be, id: 387 2024-09-20 17:56:14.494 13226-14967 llama-android.cpp com.example.llama I cached: positive, new_token_chars: positive, id: 6785 2024-09-20 17:56:14.572 13226-14967 llama-android.cpp com.example.llama I cached: real, new_token_chars: real, id: 1931 2024-09-20 17:56:14.595 13226-13226 Quality com.example.llama I Skipped: false 1 cost 11.006954 refreshRate 8288280 bit true processName com.example.llama 2024-09-20 17:56:14.671 13226-14967 llama-android.cpp com.example.llama I cached: numbers, new_token_chars: numbers, id: 5109 2024-09-20 17:56:14.760 13226-14967 llama-android.cpp com.example.llama I cached: such, new_token_chars: such, id: 1741 2024-09-20 17:56:14.774 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.316478 refreshRate 8288244 bit true processName com.example.llama 2024-09-20 17:56:14.849 13226-14967 llama-android.cpp com.example.llama I cached: that, new_token_chars: that, id: 429 2024-09-20 17:56:14.867 13226-13226 Quality com.example.llama I Skipped: false 1 cost 10.274869 refreshRate 8288205 bit true processName com.example.llama 2024-09-20 17:56:14.940 13226-14967 llama-android.cpp com.example.llama I cached: $, new_token_chars: $, id: 400 2024-09-20 17:56:15.031 13226-14967 llama-android.cpp com.example.llama I cached: a, new_token_chars: a, id: 64 2024-09-20 17:56:15.126 13226-14967 llama-android.cpp com.example.llama I cached: +b, new_token_chars: +b, id: 35093 2024-09-20 17:56:15.208 13226-14967 llama-android.cpp com.example.llama I cached: +c, new_token_chars: +c, id: 49138 2024-09-20 17:56:15.301 13226-14967 llama-android.cpp com.example.llama I cached: =, new_token_chars: =, id: 28 2024-09-20 17:56:15.321 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.470056 refreshRate 8288194 bit true processName com.example.llama 2024-09-20 17:56:15.385 13226-14967 llama-android.cpp com.example.llama I cached: 3, new_token_chars: 3, id: 18 2024-09-20 17:56:15.405 13226-13226 Quality com.example.llama I Skipped: true 1 cost 9.017843 refreshRate 8288208 bit true processName com.example.llama 2024-09-20 17:56:15.479 13226-14967 llama-android.cpp com.example.llama I cached: $., new_token_chars: $., id: 12947 2024-09-20 17:56:15.563 13226-14967 llama-android.cpp com.example.llama I cached: Pro, new_token_chars: Pro, id: 1298 2024-09-20 17:56:15.655 13226-14967 llama-android.cpp com.example.llama I cached: ve, new_token_chars: ve, id: 586 2024-09-20 17:56:15.750 13226-14967 llama-android.cpp com.example.llama I cached: that, new_token_chars: that, id: 429 2024-09-20 17:56:15.769 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.566671 refreshRate 8288414 bit true processName com.example.llama 2024-09-20 17:56:15.842 13226-14967 llama-android.cpp com.example.llama I cached: , new_token_chars: , id: 198 2024-09-20 17:56:15.929 13226-14967 llama-android.cpp com.example.llama I cached: [, new_token_chars: \[, id: 78045 2024-09-20 17:56:15.952 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.996249 refreshRate 8288515 bit true processName com.example.llama 2024-09-20 17:56:16.022 13226-14967 llama-android.cpp com.example.llama I cached: \, new_token_chars: \, id: 1124 2024-09-20 17:56:16.042 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.481175 refreshRate 8288583 bit true processName com.example.llama 2024-09-20 17:56:16.110 13226-14967 llama-android.cpp com.example.llama I cached: frac, new_token_chars: frac, id: 37018 2024-09-20 17:56:16.185 13226-14967 llama-android.cpp com.example.llama I cached: {, new_token_chars: {, id: 90 2024-09-20 17:56:16.274 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:16.365 13226-14967 llama-android.cpp com.example.llama I cached: }{, new_token_chars: }{, id: 15170 2024-09-20 17:56:16.455 13226-14967 llama-android.cpp com.example.llama I cached: a, new_token_chars: a, id: 64 2024-09-20 17:56:16.537 13226-14967 llama-android.cpp com.example.llama I cached: ^, new_token_chars: ^, id: 61 2024-09-20 17:56:16.557 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.769447 refreshRate 8288628 bit true processName com.example.llama 2024-09-20 17:56:16.625 13226-14967 llama-android.cpp com.example.llama I cached: 2, new_token_chars: 2, id: 17 2024-09-20 17:56:16.708 13226-14967 llama-android.cpp com.example.llama I cached: +, new_token_chars: +, id: 10 2024-09-20 17:56:16.797 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:16.884 13226-14967 llama-android.cpp com.example.llama I cached: }, new_token_chars: }, id: 92 2024-09-20 17:56:16.976 13226-14967 llama-android.cpp com.example.llama I cached: +\, new_token_chars: +\, id: 41715 2024-09-20 17:56:17.064 13226-14967 llama-android.cpp com.example.llama I cached: frac, new_token_chars: frac, id: 37018 2024-09-20 17:56:17.159 13226-14967 llama-android.cpp com.example.llama I cached: {, new_token_chars: {, id: 90 2024-09-20 17:56:17.244 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:17.328 13226-14967 llama-android.cpp com.example.llama I cached: }{, new_token_chars: }{, id: 15170 2024-09-20 17:56:17.421 13226-14967 llama-android.cpp com.example.llama I cached: b, new_token_chars: b, id: 65 2024-09-20 17:56:17.506 13226-14967 llama-android.cpp com.example.llama I cached: ^, new_token_chars: ^, id: 61 2024-09-20 17:56:17.596 13226-14967 llama-android.cpp com.example.llama I cached: 2, new_token_chars: 2, id: 17 2024-09-20 17:56:17.682 13226-14967 llama-android.cpp com.example.llama I cached: +, new_token_chars: +, id: 10 2024-09-20 17:56:17.774 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:17.864 13226-14967 llama-android.cpp com.example.llama I cached: }, new_token_chars: }, id: 92 2024-09-20 17:56:17.950 13226-14967 llama-android.cpp com.example.llama I cached: +\, new_token_chars: +\, id: 41715 2024-09-20 17:56:18.034 13226-14967 llama-android.cpp com.example.llama I cached: frac, new_token_chars: frac, id: 37018 2024-09-20 17:56:18.057 13226-13226 Quality com.example.llama I Skipped: true 1 cost 9.095158 refreshRate 8288880 bit true processName com.example.llama 2024-09-20 17:56:18.122 13226-14967 llama-android.cpp com.example.llama I cached: {, new_token_chars: {, id: 90 2024-09-20 17:56:18.205 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:18.297 13226-14967 llama-android.cpp com.example.llama I cached: }{, new_token_chars: }{, id: 15170 2024-09-20 17:56:18.408 13226-14967 llama-android.cpp com.example.llama I cached: c, new_token_chars: c, id: 66 2024-09-20 17:56:18.500 13226-14967 llama-android.cpp com.example.llama I cached: ^, new_token_chars: ^, id: 61 2024-09-20 17:56:18.576 13226-14967 llama-android.cpp com.example.llama I cached: 2, new_token_chars: 2, id: 17 2024-09-20 17:56:18.662 13226-14967 llama-android.cpp com.example.llama I cached: +, new_token_chars: +, id: 10 2024-09-20 17:56:18.752 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1, id: 16 2024-09-20 17:56:18.841 13226-14967 llama-android.cpp com.example.llama I cached: }\, new_token_chars: }\, id: 11035 2024-09-20 17:56:18.931 13226-14967 llama-android.cpp com.example.llama I cached: ge, new_token_chars: ge, id: 709 2024-09-20 17:56:19.020 13226-14967 llama-android.cpp com.example.llama I cached: q, new_token_chars: q, id: 80 2024-09-20 17:56:19.114 13226-14967 llama-android.cpp com.example.llama I cached: \, new_token_chars: \, id: 1124 2024-09-20 17:56:19.552 13226-1

Screenshot:

Screenshot 2024-09-20 at 18 02 08
Flyfish233 commented 2 days ago

Sorry about that. Can you try adding a prompt? I'm working on a custom build with llama-android.cpp and it works fine.

This is my message parameter for the completionInit() function:

    system
You are a knowledgeable, efficient, and direct AI assistant. 
    user
How to install Microsoft C++ Build Tools
    assistant
xunuohope1107 commented 2 days ago

system You are a knowledgeable, efficient, and direct AI assistant.

user

How to install Microsoft C++ Build Tools assistant

I tried the prompt, it's better now,

Screenshot 2024-09-20 at 18 44 47

But I was wondering about one thing. Does that mean each prompt must follow the template of chat complete, which include roles like system, user and assistant? Also, the output sentence often seems not complete. I tried to change the nlen from 64 to 128, but seems not work.

xunuohope1107 commented 2 days ago

When I try to build the llama.cpp on termux, the output is perfect. Not sure why the android example is quite different.

Flyfish233 commented 2 days ago

Must follow the template of chat complete

It depends on the model. Just inject these prompts before sending messages.

Not sure why the android example is quite different.

You are probably using llama-cli, not llama-android.cpp, which is not for out-of-the-box experience. This is a demo, not production ready. So I think it's reasonable. Just write your own implementation.

Also, the output sentence often seems not complete.

Try larger nLen = 2048, works on my Oneplus 12R.

Screenshot_2024-09-20-20-43-02-12_c0a2791fbe2158f00ffcfc1d12b0490a