Open zcxo opened 2 months ago
I have adapted your patch to the llama.cpp version on September 5th, but there aren't too many changes needed. Currently, the only issue left is __memcpy_aarch64_stimd, which seems to be some compatibility issues. I don't know if it's a compilation problem or something else:
09-09 18:49:02.670 F/DEBUG (20317): #01 pc 00000000000b57fc /system/lib64/libggml.so (ggml_kai_prepare_const_data+484) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d) 09-09 18:49:02.670 F/DEBUG (20317): #02 pc 000000000003f3d8 /system/lib64/libggml.so (ggml_graph_compute+120) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d)
The error code is located in ggml-kleidiai.cpp: memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur));
GGML_ASSERT(reshaped_data_sz <= original_data_size); memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur)); free(reshaped_data); cur->extra = cur->data;
g_extra_mem[g_extra_mem_idx++] = reshaped_data; cur->extra = reshaped_data;
Hi @zcxo ,
Thanks for bringing this to our attention. Glad to know that you have found this useful :+1:. This patch was created to demonstrate a possible integration point for KleidiAI in llama.cpp. We will work separately with llama.cpp to provide a proper solution.
Dear ARM-software : and @kshitij-sisodia-arm Although I waited a bit long, I am still glad to receive your reply. I have tried to adapt and made modifications based on my own understanding of the issues encountered during the process. Currently, I plan to apply them to the formal project. But currently, I am still concerned about whether there may be other impacts, so I hope the official can release a version as soon as possible to help us developers. Thank you.
Dear ARM-software :
I am excited and pleasantly surprised to see that ARM has officially launched the KleidiAI solution for inference! I have tried this solution and it has indeed greatly accelerated both the prompts and generation stages. However, I have encountered an issue. Currently, the patch you provide is for the 2024 June llama.cpp version. Do you have a patch plan for the latest version of llama.cpp? The following are the errors I have encountered while trying to adapt:
0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch
Cmdline: com.algorithm.example pid: 15209, tid: 15209, name: binder:15122_2 >>> com.algorithm.example <<<
01 pc 00000000001b4e8c /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (ggml_kai_compute_forward+1968) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
report fatal app NE crash BR success to write SystemTombstoneOccured to statsd: 15209, com.algorithm.example buildException pid = 15209 uid = 10157 packageName = com.algorithm.example processName = com.algorithm.example reportConfig = false buildReporterName pkgName = com.algorithm.example processName = com.algorithm.example Log base dir: /data/misc/ems_logs/APP_NE@15209_com.algorithm.example_2024-08-22-23-57-22.972/FATAL_2024-08-22-23-57-22 reportFatalNEInner isApp=true, pid=15209, uid=10157, processName=com.algorithm.example, packageName=com.algorithm.example, tombstone=/data/tombstones/tombstone_35 report success reportTombstoneFile path=/data/tombstones/tombstone_35 processname=com.algorithm.example
Hope for your reply, thank you very much!