What happened?

when i use Llama3-8B-Chinese-Chat-f16-v2_1.gguf to run llama.cpp, here is a crash: here is my cmd: ./llama-cli -m /home/c00662745/llama3/llama3/llama3_chinese_gguf/Llama3-8B-Chinese-Chat-f16-v2_1.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer

here is the error: ’‘’ CANN error: EE9999: Inner Error! EE9999: [PID: 2750884] 2024-08-30-16:20:38.196.490 Stream destroy failed, stream is not in current ctx, stream_id=2.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1032] TraceBack (most recent call last): rtStreamDestroy execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] destroy stream failed, runtime result = 107003[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

current device: 1, in function ~ggml_backend_cann_context at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h:235 aclrtDestroyStream(streams[i]) /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:123: CANN error [New LWP 2750924] [New LWP 2750937] [New LWP 2753277] [New LWP 2753281] [New LWP 2753615] [New LWP 2753616] [New LWP 2753623] [New LWP 2753626] [New LWP 2753900] [New LWP 2753901] [New LWP 2757030] [New LWP 2757031] [New LWP 2757032] [New LWP 2757033] [New LWP 2757034] [New LWP 2757035] [New LWP 2757036] [New LWP 2757037] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". 0x0000ffff8e7edc00 in wait4 () from /usr/lib64/libc.so.6

0 0x0000ffff8e7edc00 in wait4 () from /usr/lib64/libc.so.6

1 0x0000ffff8ec019f0 in ggml_print_backtrace () at /home/zn/new-llama/llama.cpp/ggml/src/ggml.c:253

253 waitpid(pid, &wstatus, 0);

2 0x0000ffff8ec01b20 in ggml_abort (file=0xffff8eccfd58 "/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp", line=123, fmt=0xffff8eccfd48 "CANN error") at /home/zn/new-llama/llama.cpp/ggml/src/ggml.c:280

280 ggml_print_backtrace();

3 0x0000ffff8ec94ab8 in ggml_cann_error (stmt=0xffff8eccfcb0 "aclrtDestroyStream(streams[i])", func=0xffff8eccfc70 "~ggml_backend_cann_context", file=0xffff8eccfc18 "/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h", line=235, msg=0x3bcc0668 "EE9999: Inner Error!\nEE9999: [PID: 2750884] 2024-08-30-16:20:38.196.490 Stream destroy failed, stream is not in current ctx, stream_id=2.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1032]\n Trace"...) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:123

warning: Source file is more recent than executable. 123 GGML_ABORT("CANN error");

4 0x0000ffff8ec97b74 in ggml_backend_cann_context::~ggml_backend_cann_context (this=0x33af8680, __in_chrg=) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h:235

235 ACL_CHECK(aclrtDestroyStream(streams[i]));

5 0x0000ffff8ec964ac in ggml_backend_cann_free (backend=0x2a8d71a0) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:1412

1412 delete cann_ctx;

6 0x0000ffff8ec49394 in ggml_backend_free (backend=0x2a8d71a0) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-backend.c:180

180 backend->iface.free(backend);

7 0x0000ffff8f18a30c in llama_context::~llama_context (this=0x29fc9fc0, __in_chrg=) at /home/zn/new-llama/llama.cpp/src/llama.cpp:3069

3069 ggml_backend_free(backend);

8 0x0000ffff8f16b744 in llama_free (ctx=0x29fc9fc0) at /home/zn/new-llama/llama.cpp/src/llama.cpp:17936

17936 delete ctx;

9 0x0000000000476d48 in main (argc=12, argv=0xfffffc7fe828) at /home/zn/new-llama/llama.cpp/examples/main/main.cpp:1020

1020 llama_free(ctx); [Inferior 1 (process 2750884) detached] Aborted (core dumped) ‘’‘

seems like in the final stream free，cann did't get the right ctx id.

Name and Version

(base) [root@localhost bin]# ./llama-cli --version version: 3645 (7ea8d80d) built with cc (GCC) 10.3.1 for aarch64-linux-gnu

What operating system are you seeing the problem on?

No response

Relevant log output