ggerganov / llama.cpp

LLM inference in C/C++
MIT License
68.55k stars 9.85k forks source link

changelog : `llama-server` REST API #9291

Open ggerganov opened 2 months ago

ggerganov commented 2 months ago

Overview

This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.

See also:

Recent API changes (most recent at the top)

version PR desc
b4027 #10162 /slots endpoint: remove slot[i].state, add slot[i].is_processing
b3912 #9865 Add option to time limit the generation phase
b3911 #9860 Remove self-extend support
b3910 #9857 Remove legacy system prompt support
b3897 #9776 Change default security settings, /slots is now disabled by default
Endpoints now check for API key if it's set
b3887 #9510 Add /rerank endpoint
b3754 #9459 Add [DONE]\n\n in OAI stream response to match spec
b3721 #9398 Add seed_cur to completion response
b3683 #9308 Environment variable updated
b3599 #9056 Change /health and /slots

For older changes, use:

git log --oneline -p b3599 -- examples/server/README.md

Upcoming API changes

ngxson commented 2 months ago

Not a REST API breaking change, but is server-related: some environment variables are changed in https://github.com/ggerganov/llama.cpp/pull/9308

slaren commented 2 months ago

After #9398, in the completion response seed contains the seed requested by the user, while seed_cur contains the seed used to generate the completion. The values can be different if seed is LLAMA_DEFAULT_SEED (or -1), in which case a random seed is generated and returned in seed_cur.

ngxson commented 1 month ago

Breaking change #9776 : better security control for public deployments

Please note that GET /props is always enabled to avoid breaking the web UI.

ngxson commented 3 weeks ago

Breaking change for /slots endpoint https://github.com/ggerganov/llama.cpp/pull/10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

isaac-mcfadyen commented 3 weeks ago

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

Was the slots endpoint also disabled by default? (or maybe just a documentation change?) https://github.com/ggerganov/llama.cpp/pull/10162/files#diff-42ce5869652f266b01a5b5bc95f4d945db304ce54545e2d0c017886a7f1cee1aR698

ngxson commented 3 weeks ago

For security reasons, "/slots" was disabled by default since https://github.com/ggerganov/llama.cpp/pull/9776 , and was mentioned in the breaking changes table. I just forgot to update the docs.

ngxson commented 3 weeks ago

Not an API change, but maybe good to know that the default web UI for llama-server changed in https://github.com/ggerganov/llama.cpp/pull/10175

If you want to use the old completion UI, please follow instruction in the PR.

ggerganov commented 5 days ago

cache_prompt: true is now used by default (#10501)