Open RifeWang opened 6 days ago
Another question is whether the server supports dynamically switching between different models after startup.
The "model"
field is solely for being openai-compatible and it does not reflect the real value.
Another question is whether the server supports dynamically switching between different models after startup.
No, we don't support this as we aims to make the code simple. Some other wrappers like ollama do support this by maintaining multiple instances of llama.cpp under the hood.
What happened?
When starting the server through a Docker image, the model must be specified; otherwise, it defaults to
models/7B/ggml-model-f16.gguf
, and if this is not present locally, the server will exit with an error.However, when using the
POST /v1/chat/completions
API, the parameters passed also include themodel
, but in reality, thismodel
parameter is not validated in any way. The response simply returns whatever the user inputs. Moreover, if the user does not input amodel
parameter, the response defaults togpt-3.5-turbo-0613
, which is clearly incorrect.It is recommended to maintain consistent
model
information: whatever model is loaded should be the same as the model that is output.Name and Version
REPOSITORY TAG IMAGE ID CREATED SIZE ghcr.io/ggerganov/llama.cpp server cd43d22f4e97 14 hours ago 203MB
What operating system are you seeing the problem on?
No response
Relevant log output