Gordonei commented 6 months ago

Thanks for a great project!

I might be misunderstanding or missing some limitation that doesn't allow for certain fields to be specified in a standalone config file, but there seem to be many server params which aren't supported on a per model basis?

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

To be able to set all of the constants found in the Server Types in ConfigFileSettings

Current Behavior

Only a subset of config params are available in the Model Configuration

Environment and Context

Running in Docker with CONFIG_FILE=/config/config.json python3 -m llama_cpp.server, with config file mounted to /config/config.json

Failure Information (for bugs)

This is an explicit type error

Steps to Reproduce

Create model config JSON, and attempt to set any of the unsupported parameters:

{
  "models": [{
       "model": "/models/gemma-1.1-2b-it-Q5_K_M.gguf",
        "model_alias": "gemma-2b-it-q5",
        "chat_format": "gemma",
        "penalize_newline": false,
       "repeat_penalty": 1
  }]
}

Failure Logs

As expected, the config file validation fails on startup:

models.2.penalize_newline
  Extra inputs are not permitted [type=extra_forbidden, input_value=False, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
models.2.repeat_penalty
  Extra inputs are not permitted [type=extra_forbidden, input_value=1, input_type=int]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden

abetlen commented 6 months ago

@Gordonei that's not a difference between the server and model configs, repeat_penalty is specified on a per request basis and penalize_newline is not a supported option. That being said I do see the value of being able to override default parameters in cases where you don't have control of the client.

Penalize nl would have to be done seperately though but could be a model / sampler wide config.

Gordonei commented 6 months ago

Probably the most relevant case I can think of is the stop parameters, which are different on a per model basis. It's nice to be able to hide those sorts of implementation details from clients of the API.

Would it just be a matter of adding the relevant fields to the ConfigFileSettings, or would there need to be additional plumbing needed? The reason I ask is that would point me in the right direction for putting together a PR.

abetlen / llama-cpp-python

Difference between Model and Server Configs? #1379