Multimodal models in - Githubissues

{ "host": "0.0.0.0", "port": 8000, "models": [ { "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf", "model_alias": "mistral", "chat_format": "chatml", "n_gpu_layers": -1, "offload_kqv": true, "n_threads": 12, "n_batch": 512, "n_ctx": 2048 }, { "model": "models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf", "model_alias": "mixtral", "chat_format": "chatml", "n_gpu_layers": -1, "offload_kqv": true, "n_threads": 12, "n_batch": 512, "n_ctx": 2048 }, { "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf", "model_alias": "mistral-function-calling", "chat_format": "functionary", "n_gpu_layers": -1, "offload_kqv": true, "n_threads": 12, "n_batch": 512, "n_ctx": 2048 } ] }

What would a configuration in this file look like for a multimodal model?

To configure a multimodal model in your JSON file, you would need to include additional parameters specific to the multimodal capabilities of the model. Here is an example of how you might configure a multimodal model:

{
  "host": "0.0.0.0",
  "port": 8000,
  "models": [
    {
      "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf",
      "model_alias": "mistral",
      "chat_format": "chatml",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048
    },
    {
      "model": "models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf",
      "model_alias": "mixtral",
      "chat_format": "chatml",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048
    },
    {
      "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf",
      "model_alias": "mistral-function-calling",
      "chat_format": "functionary",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048
    },
    {
      "model": "models/multimodal-model-v0.1.Q4_0.gguf",
      "model_alias": "multimodal-model",
      "chat_format": "multimodal",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048,
      "multimodal_params": {
        "image_size": [224, 224],
        "image_channels": 3,
        "text_max_length": 512,
        "vision_model": "models/vision-model-v0.1.gguf",
        "text_model": "models/text-model-v0.1.gguf"
      }
    }
  ]
}

In this example, the multimodal model configuration includes:

model: The path to the multimodal model file.
model_alias: An alias for the multimodal model.
chat_format: The chat format used by the multimodal model.
n_gpu_layers, offload_kqv, n_threads, n_batch, n_ctx: General configuration parameters.
multimodal_params: A nested object containing parameters specific to the multimodal model, such as:
- image_size: The size of the input images.
- image_channels: The number of channels in the input images.
- text_max_length: The maximum length of text inputs.
- vision_model: The path to the vision model file.
- text_model: The path to the text model file.

You can adjust these parameters based on the specific requirements and capabilities of your multimodal model.

Jaimboh / Llama.cpp-Local-OpenAI-server

Multimodal models in #1