zcxo commented 2 months ago

Dear ARM-software :

I am excited and pleasantly surprised to see that ARM has officially launched the KleidiAI solution for inference! I have tried this solution and it has indeed greatly accelerated both the prompts and generation stages. However, I have encountered an issue. Currently, the patch you provide is for the 2024 June llama.cpp version. Do you have a patch plan for the latest version of llama.cpp? The following are the errors I have encountered while trying to adapt:

0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch

Cmdline: com.algorithm.example pid: 15209, tid: 15209, name: binder:15122_2 >>> com.algorithm.example <<<

01 pc 00000000001b4e8c /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (ggml_kai_compute_forward+1968) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)

  #02 pc 0000000000172b64  /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
  #03 pc 00000000001729f4  /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)
  #04 pc 00000000000c40c8  /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libomp.so (offset 0x321e000) (__kmp_invoke_microtask+152) (BuildId: 420baf65ff745db7ffc43e2d9942756a154fceae)

report fatal app NE crash BR success to write SystemTombstoneOccured to statsd: 15209, com.algorithm.example buildException pid = 15209 uid = 10157 packageName = com.algorithm.example processName = com.algorithm.example reportConfig = false buildReporterName pkgName = com.algorithm.example processName = com.algorithm.example Log base dir: /data/misc/ems_logs/APP_NE@15209_com.algorithm.example_2024-08-22-23-57-22.972/FATAL_2024-08-22-23-57-22 reportFatalNEInner isApp=true, pid=15209, uid=10157, processName=com.algorithm.example, packageName=com.algorithm.example, tombstone=/data/tombstones/tombstone_35 report success reportTombstoneFile path=/data/tombstones/tombstone_35 processname=com.algorithm.example

Hope for your reply, thank you very much!

zcxo commented 2 months ago

I have adapted your patch to the llama.cpp version on September 5th, but there aren't too many changes needed. Currently, the only issue left is __memcpy_aarch64_stimd, which seems to be some compatibility issues. I don't know if it's a compilation problem or something else:

00 pc 0000000000054438 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy_aarch64_simd+248) (BuildId: cdb09e5d494726046776ac6d0238c81f)

09-09 18:49:02.670 F/DEBUG (20317): #01 pc 00000000000b57fc /system/lib64/libggml.so (ggml_kai_prepare_const_data+484) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d) 09-09 18:49:02.670 F/DEBUG (20317): #02 pc 000000000003f3d8 /system/lib64/libggml.so (ggml_graph_compute+120) (BuildId: 72a2a4918d97c527dddf645692b8575bda53ed6d)

The error code is located in ggml-kleidiai.cpp: memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur));

if defined(GGML_KLEIDIAI_REUSE_MEMORY)

GGML_ASSERT(reshaped_data_sz <= original_data_size); memcpy(cur->data, (void *)reshaped_data, ggml_nbytes(cur)); free(reshaped_data); cur->extra = cur->data;

else

g_extra_mem[g_extra_mem_idx++] = reshaped_data; cur->extra = reshaped_data;

endif

kshitij-sisodia-arm commented 2 months ago

Hi @zcxo ,

Thanks for bringing this to our attention. Glad to know that you have found this useful :+1:. This patch was created to demonstrate a possible integration point for KleidiAI in llama.cpp. We will work separately with llama.cpp to provide a proper solution.

zcxo commented 2 months ago

Dear ARM-software : and @kshitij-sisodia-arm Although I waited a bit long, I am still glad to receive your reply. I have tried to adapt and made modifications based on my own understanding of the issues encountered during the process. Currently, I plan to apply them to the formal project. But currently, I am still concerned about whether there may be other impacts, so I hope the official can release a version as soon as possible to help us developers. Thank you.

ARM-software / ML-examples

merged "0001-Use-KleidiAI-Int4-Matmul-micro-kernels-in-llama.cpp.patch" to latest llama.cpp FAIL #143

01 pc 00000000001b4e8c /data/app/~~5V7P5OPSvLkqk0-B1FssOw==/com.algorithm.example-umHyBrLVPxbWoOjPMwx1IQ==/base.apk!libllama_native.so (offset 0x1edc000) (ggml_kai_compute_forward+1968) (BuildId: 4cadfb9fa830ea791a43b93a8e2ef9353fb03c52)

00 pc 0000000000054438 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy_aarch64_simd+248) (BuildId: cdb09e5d494726046776ac6d0238c81f)

if defined(GGML_KLEIDIAI_REUSE_MEMORY)

else

endif