To reproduce, the client can be simply curl as in the provided example:
curl --request POST \
--url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
The client gets no response or error in the ill case.
Name and Version
$ ./llama-cli --version
version: 4126 (d3481e63)
built with cc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22) for x86_64-redhat-linux
What operating system are you seeing the problem on?
Linux
Relevant log output
Server log in the success case (start-up boilerplate logs truncated)
====================================================================
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 13
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 13, n_tokens = 13, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 13, n_tokens = 13
slot release: id 0 | task 0 | stop processing: n_past = 140, truncated = 0
slot print_timing: id 0 | task 0 |
prompt eval time = 25.09 ms / 13 tokens ( 1.93 ms per token, 518.07 tokens per second)
eval time = 1270.83 ms / 128 tokens ( 9.93 ms per token, 100.72 tokens per second)
total time = 1295.92 ms / 141 tokens
request: POST /completion 127.0.0.1 200
srv update_slots: all slots are idle
Server log in the failure case
==============================
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 13
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 13, n_tokens = 13, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 13, n_tokens = 13
(SERVER HANGS HERE AFTER PREFILL IS DONE)
What happened?
As the title suggests, this will cause the server to hang,
while this will not
To reproduce, the client can be simply
curl
as in the provided example:The client gets no response or error in the ill case.
Name and Version
What operating system are you seeing the problem on?
Linux
Relevant log output