Bug: llama-server crash with `--embeddings`

What happened?

After starting with the following command, it will occasionally crash suddenly while running.

llama-server -m ./bge-large-zh-v1.5 --port 3358 -a emb@bge-large-zh-v1.5 -ngl 100 -c 8192 --samplers tempera ture;top_p --embeddings -ub 8192 --pooling cls

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 4 CUDA devices: Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes Device 1: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes Device 2: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes Device 3: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes version: 3945 (45f09764) built with cc (Debian 10.2.1-6) 10.2.1 20210110 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

[Thread debugging using libthread_db enabled]                                                                                         
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".                                                            
0x00007f210ada7787 in __GI___wait4 (pid=3074567, stat_loc=0x7fffc0b6d3a4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:
27                                                                                                                                    
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.                                                                
#0  0x00007f210ada7787 in __GI___wait4 (pid=3074567, stat_loc=0x7fffc0b6d3a4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait
4.c:27                                                                                                                                
27      in ../sysdeps/unix/sysv/linux/wait4.c                                                                                         
#1  0x00007f210b21b638 in ggml_abort () from /home/user/llama.cpp/build/ggml/src/libggml.so                                       
#2  0x00007f210b21f700 in ggml_compute_forward_get_rows () from //home/user/llama.cpp/build/ggml/src/libggml.so                    
#3  0x00007f210b24d0a2 in ggml_graph_compute_thread.isra () from /home/user/llama.cpp/build/ggml/src/libggml.so                   
#4  0x00007f210b250cf6 in ggml_graph_compute () from /home/user/llama.cpp/build/ggml/src/libggml.so                               
#5  0x00007f210b25caf3 in ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) () from /home/user/llama.cpp/build/ggml/src/
libggml.so                                                                                                                            
#6  0x00007f210b261d75 in ggml_backend_sched_graph_compute_async () from /home/user/llama.cpp/build/ggml/src/libggml.so           
#7  0x00007f2120eb15c2 in llama_decode () from /home/user/llama.cpp/build/src/libllama.so                                         
#8  0x000055776938aa04 in server_context::update_slots() ()                                                                           
#9  0x000055776936d7e1 in server_queue::start_loop() ()                                                                               
#10 0x0000557769324981 in main ()                                                                                                     
[Inferior 1 (process 2694414) detached]

ggerganov / llama.cpp