abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.02k stars 951 forks source link

Difference between Model and Server Configs? #1379

Open Gordonei opened 6 months ago

Gordonei commented 6 months ago

Thanks for a great project!

I might be misunderstanding or missing some limitation that doesn't allow for certain fields to be specified in a standalone config file, but there seem to be many server params which aren't supported on a per model basis?

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

To be able to set all of the constants found in the Server Types in ConfigFileSettings

Current Behavior

Only a subset of config params are available in the Model Configuration

Environment and Context

Running in Docker with CONFIG_FILE=/config/config.json python3 -m llama_cpp.server, with config file mounted to /config/config.json

Failure Information (for bugs)

This is an explicit type error

Steps to Reproduce

Create model config JSON, and attempt to set any of the unsupported parameters:

{
  "models": [{
       "model": "/models/gemma-1.1-2b-it-Q5_K_M.gguf",
        "model_alias": "gemma-2b-it-q5",
        "chat_format": "gemma",
        "penalize_newline": false,
       "repeat_penalty": 1
  }]
}

Failure Logs

As expected, the config file validation fails on startup:

models.2.penalize_newline
  Extra inputs are not permitted [type=extra_forbidden, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
models.2.repeat_penalty
  Extra inputs are not permitted [type=extra_forbidden, input_value=1, input_type=int]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
abetlen commented 6 months ago

@Gordonei that's not a difference between the server and model configs, repeat_penalty is specified on a per request basis and penalize_newline is not a supported option. That being said I do see the value of being able to override default parameters in cases where you don't have control of the client.

Penalize nl would have to be done seperately though but could be a model / sampler wide config.

Gordonei commented 6 months ago

Probably the most relevant case I can think of is the stop parameters, which are different on a per model basis. It's nice to be able to hide those sorts of implementation details from clients of the API.

Would it just be a matter of adding the relevant fields to the ConfigFileSettings, or would there need to be additional plumbing needed? The reason I ask is that would point me in the right direction for putting together a PR.