Open xunuohope1107 opened 3 days ago
Try to load other models. Not happening on b3787, qwen2-1_5b-instruct-q5_0.gguf
Try to load other models. Not happening on b3787, qwen2-1_5b-instruct-q5_0.gguf
I tried with qwen2-1_5b-instruct-q5_0.gguf on b3788 (llama.android), but still got unreasonable output. I tested on Xiaomi 14(16G RAM), Oneplus 12R (16G RAM) and Pixel 4a, here is the log from console:
2024-09-20 17:56:05.691 13226-13226 ExtensionsLoader com.example.llama D Opened libSchedAssistExtImpl.so
2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I QUALCOMM build : 1a285a84ae, I2991b7e11e
Build Date : 06/04/23
OpenGL ES Shader Compiler Version: E031.41.03.36
Local Branch :
Remote Branch :
Remote Branch :
Reconstruct Branch :
2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I Build Config : S P 14.1.4 AArch64
2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I Driver Path : /vendor/lib64/egl/libGLESv2_adreno.so
2024-09-20 17:56:05.693 13226-14930 AdrenoGLES-0 com.example.llama I Driver Version : 0676.32
2024-09-20 17:56:05.695 13226-14930 AdrenoGLES-0 com.example.llama I PFP: 0x01740158, ME: 0x00000000
2024-09-20 17:56:05.699 13226-14930 AdrenoUtils com.example.llama I
2024-09-20 17:56:13.539 13226-13226 Quality com.example.llama I Skipped: false 6 cost 57.353714 refreshRate 8333333 bit true processName com.example.llama
2024-09-20 17:56:13.645 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:13.736 13226-14967 llama-android.cpp com.example.llama I cached: ., new_token_chars: .
, id: 13
2024-09-20 17:56:13.759 13226-13226 Quality com.example.llama I Skipped: false 1 cost 12.094041 refreshRate 8289116 bit true processName com.example.llama
2024-09-20 17:56:13.833 13226-14967 llama-android.cpp com.example.llama I cached: Let, new_token_chars: Let
, id: 6771
2024-09-20 17:56:13.858 13226-13226 Quality com.example.llama I Skipped: false 1 cost 11.876371 refreshRate 8289015 bit true processName com.example.llama
2024-09-20 17:56:13.953 13226-14967 llama-android.cpp com.example.llama I cached: $, new_token_chars: $
, id: 400
2024-09-20 17:56:14.044 13226-14967 llama-android.cpp com.example.llama I cached: a, new_token_chars: a
, id: 64
2024-09-20 17:56:14.064 13226-13226 Quality com.example.llama I Skipped: false 1 cost 10.482091 refreshRate 8289267 bit true processName com.example.llama
2024-09-20 17:56:14.132 13226-14967 llama-android.cpp com.example.llama I cached: ,b, new_token_chars: ,b
, id: 8402
2024-09-20 17:56:14.216 13226-14967 llama-android.cpp com.example.llama I cached: ,c, new_token_chars: ,c
, id: 10109
2024-09-20 17:56:14.310 13226-14967 llama-android.cpp com.example.llama I cached: $, new_token_chars: $
, id: 3
2024-09-20 17:56:14.327 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.775117 refreshRate 8288756 bit true processName com.example.llama
2024-09-20 17:56:14.403 13226-14967 llama-android.cpp com.example.llama I cached: be, new_token_chars: be
, id: 387
2024-09-20 17:56:14.494 13226-14967 llama-android.cpp com.example.llama I cached: positive, new_token_chars: positive
, id: 6785
2024-09-20 17:56:14.572 13226-14967 llama-android.cpp com.example.llama I cached: real, new_token_chars: real
, id: 1931
2024-09-20 17:56:14.595 13226-13226 Quality com.example.llama I Skipped: false 1 cost 11.006954 refreshRate 8288280 bit true processName com.example.llama
2024-09-20 17:56:14.671 13226-14967 llama-android.cpp com.example.llama I cached: numbers, new_token_chars: numbers
, id: 5109
2024-09-20 17:56:14.760 13226-14967 llama-android.cpp com.example.llama I cached: such, new_token_chars: such
, id: 1741
2024-09-20 17:56:14.774 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.316478 refreshRate 8288244 bit true processName com.example.llama
2024-09-20 17:56:14.849 13226-14967 llama-android.cpp com.example.llama I cached: that, new_token_chars: that
, id: 429
2024-09-20 17:56:14.867 13226-13226 Quality com.example.llama I Skipped: false 1 cost 10.274869 refreshRate 8288205 bit true processName com.example.llama
2024-09-20 17:56:14.940 13226-14967 llama-android.cpp com.example.llama I cached: $, new_token_chars: $
, id: 400
2024-09-20 17:56:15.031 13226-14967 llama-android.cpp com.example.llama I cached: a, new_token_chars: a
, id: 64
2024-09-20 17:56:15.126 13226-14967 llama-android.cpp com.example.llama I cached: +b, new_token_chars: +b
, id: 35093
2024-09-20 17:56:15.208 13226-14967 llama-android.cpp com.example.llama I cached: +c, new_token_chars: +c
, id: 49138
2024-09-20 17:56:15.301 13226-14967 llama-android.cpp com.example.llama I cached: =, new_token_chars: =
, id: 28
2024-09-20 17:56:15.321 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.470056 refreshRate 8288194 bit true processName com.example.llama
2024-09-20 17:56:15.385 13226-14967 llama-android.cpp com.example.llama I cached: 3, new_token_chars: 3
, id: 18
2024-09-20 17:56:15.405 13226-13226 Quality com.example.llama I Skipped: true 1 cost 9.017843 refreshRate 8288208 bit true processName com.example.llama
2024-09-20 17:56:15.479 13226-14967 llama-android.cpp com.example.llama I cached: $., new_token_chars: $.
, id: 12947
2024-09-20 17:56:15.563 13226-14967 llama-android.cpp com.example.llama I cached: Pro, new_token_chars: Pro
, id: 1298
2024-09-20 17:56:15.655 13226-14967 llama-android.cpp com.example.llama I cached: ve, new_token_chars: ve
, id: 586
2024-09-20 17:56:15.750 13226-14967 llama-android.cpp com.example.llama I cached: that, new_token_chars: that
, id: 429
2024-09-20 17:56:15.769 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.566671 refreshRate 8288414 bit true processName com.example.llama
2024-09-20 17:56:15.842 13226-14967 llama-android.cpp com.example.llama I cached:
, new_token_chars:
, id: 198
2024-09-20 17:56:15.929 13226-14967 llama-android.cpp com.example.llama I cached: [, new_token_chars: \[
, id: 78045
2024-09-20 17:56:15.952 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.996249 refreshRate 8288515 bit true processName com.example.llama
2024-09-20 17:56:16.022 13226-14967 llama-android.cpp com.example.llama I cached: \, new_token_chars: \
, id: 1124
2024-09-20 17:56:16.042 13226-13226 Quality com.example.llama I Skipped: false 1 cost 8.481175 refreshRate 8288583 bit true processName com.example.llama
2024-09-20 17:56:16.110 13226-14967 llama-android.cpp com.example.llama I cached: frac, new_token_chars: frac
, id: 37018
2024-09-20 17:56:16.185 13226-14967 llama-android.cpp com.example.llama I cached: {, new_token_chars: {
, id: 90
2024-09-20 17:56:16.274 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:16.365 13226-14967 llama-android.cpp com.example.llama I cached: }{, new_token_chars: }{
, id: 15170
2024-09-20 17:56:16.455 13226-14967 llama-android.cpp com.example.llama I cached: a, new_token_chars: a
, id: 64
2024-09-20 17:56:16.537 13226-14967 llama-android.cpp com.example.llama I cached: ^, new_token_chars: ^
, id: 61
2024-09-20 17:56:16.557 13226-13226 Quality com.example.llama I Skipped: true 1 cost 8.769447 refreshRate 8288628 bit true processName com.example.llama
2024-09-20 17:56:16.625 13226-14967 llama-android.cpp com.example.llama I cached: 2, new_token_chars: 2
, id: 17
2024-09-20 17:56:16.708 13226-14967 llama-android.cpp com.example.llama I cached: +, new_token_chars: +
, id: 10
2024-09-20 17:56:16.797 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:16.884 13226-14967 llama-android.cpp com.example.llama I cached: }, new_token_chars: }
, id: 92
2024-09-20 17:56:16.976 13226-14967 llama-android.cpp com.example.llama I cached: +\, new_token_chars: +\
, id: 41715
2024-09-20 17:56:17.064 13226-14967 llama-android.cpp com.example.llama I cached: frac, new_token_chars: frac
, id: 37018
2024-09-20 17:56:17.159 13226-14967 llama-android.cpp com.example.llama I cached: {, new_token_chars: {
, id: 90
2024-09-20 17:56:17.244 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:17.328 13226-14967 llama-android.cpp com.example.llama I cached: }{, new_token_chars: }{
, id: 15170
2024-09-20 17:56:17.421 13226-14967 llama-android.cpp com.example.llama I cached: b, new_token_chars: b
, id: 65
2024-09-20 17:56:17.506 13226-14967 llama-android.cpp com.example.llama I cached: ^, new_token_chars: ^
, id: 61
2024-09-20 17:56:17.596 13226-14967 llama-android.cpp com.example.llama I cached: 2, new_token_chars: 2
, id: 17
2024-09-20 17:56:17.682 13226-14967 llama-android.cpp com.example.llama I cached: +, new_token_chars: +
, id: 10
2024-09-20 17:56:17.774 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:17.864 13226-14967 llama-android.cpp com.example.llama I cached: }, new_token_chars: }
, id: 92
2024-09-20 17:56:17.950 13226-14967 llama-android.cpp com.example.llama I cached: +\, new_token_chars: +\
, id: 41715
2024-09-20 17:56:18.034 13226-14967 llama-android.cpp com.example.llama I cached: frac, new_token_chars: frac
, id: 37018
2024-09-20 17:56:18.057 13226-13226 Quality com.example.llama I Skipped: true 1 cost 9.095158 refreshRate 8288880 bit true processName com.example.llama
2024-09-20 17:56:18.122 13226-14967 llama-android.cpp com.example.llama I cached: {, new_token_chars: {
, id: 90
2024-09-20 17:56:18.205 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:18.297 13226-14967 llama-android.cpp com.example.llama I cached: }{, new_token_chars: }{
, id: 15170
2024-09-20 17:56:18.408 13226-14967 llama-android.cpp com.example.llama I cached: c, new_token_chars: c
, id: 66
2024-09-20 17:56:18.500 13226-14967 llama-android.cpp com.example.llama I cached: ^, new_token_chars: ^
, id: 61
2024-09-20 17:56:18.576 13226-14967 llama-android.cpp com.example.llama I cached: 2, new_token_chars: 2
, id: 17
2024-09-20 17:56:18.662 13226-14967 llama-android.cpp com.example.llama I cached: +, new_token_chars: +
, id: 10
2024-09-20 17:56:18.752 13226-14967 llama-android.cpp com.example.llama I cached: 1, new_token_chars: 1
, id: 16
2024-09-20 17:56:18.841 13226-14967 llama-android.cpp com.example.llama I cached: }\, new_token_chars: }\
, id: 11035
2024-09-20 17:56:18.931 13226-14967 llama-android.cpp com.example.llama I cached: ge, new_token_chars: ge
, id: 709
2024-09-20 17:56:19.020 13226-14967 llama-android.cpp com.example.llama I cached: q, new_token_chars: q
, id: 80
2024-09-20 17:56:19.114 13226-14967 llama-android.cpp com.example.llama I cached: \, new_token_chars: \
, id: 1124
2024-09-20 17:56:19.552 13226-1
Screenshot:
Sorry about that. Can you try adding a prompt? I'm working on a custom build with llama-android.cpp and it works fine.
This is my message
parameter for the completionInit()
function:
system
You are a knowledgeable, efficient, and direct AI assistant.
user
How to install Microsoft C++ Build Tools
assistant
system You are a knowledgeable, efficient, and direct AI assistant.
user
How to install Microsoft C++ Build Tools assistant
I tried the prompt, it's better now,
But I was wondering about one thing. Does that mean each prompt must follow the template of chat complete, which include roles like system, user and assistant? Also, the output sentence often seems not complete. I tried to change the nlen from 64 to 128, but seems not work.
When I try to build the llama.cpp on termux, the output is perfect. Not sure why the android example is quite different.
Must follow the template of chat complete
It depends on the model. Just inject these prompts before sending messages.
Not sure why the android example is quite different.
You are probably using llama-cli
, not llama-android.cpp, which is not for out-of-the-box experience. This is a demo, not production ready. So I think it's reasonable. Just write your own implementation.
Also, the output sentence often seems not complete.
Try larger nLen = 2048
, works on my Oneplus 12R.
What happened?
I have built and run the android example project under examples/llama.android, but found the output from the android UI is very hard to understand. I try the a lot of prompt like "hello", "why sky is blue?" on several real devices as well as virtual devices. The output is not a sentence but a random combination of words or programming code.
Name and Version
b3785, android arm64-v8a
What operating system are you seeing the problem on?
Android arm64-v8a
Relevant log output
Screenshot: