Jaimboh / Llama.cpp-Local-OpenAI-server

This is a repository that shows you how you can create your local openai server and make an api calls just as you would do with Openai models
MIT License
2 stars 3 forks source link

Multimodal models in #1

Open aderbalbotelho opened 2 weeks ago

aderbalbotelho commented 2 weeks ago

{ "host": "0.0.0.0", "port": 8000, "models": [ { "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf", "model_alias": "mistral", "chat_format": "chatml", "n_gpu_layers": -1, "offload_kqv": true, "n_threads": 12, "n_batch": 512, "n_ctx": 2048 }, { "model": "models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf", "model_alias": "mixtral", "chat_format": "chatml", "n_gpu_layers": -1, "offload_kqv": true, "n_threads": 12, "n_batch": 512, "n_ctx": 2048 }, { "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf", "model_alias": "mistral-function-calling", "chat_format": "functionary", "n_gpu_layers": -1, "offload_kqv": true, "n_threads": 12, "n_batch": 512, "n_ctx": 2048 } ] }

What would a configuration in this file look like for a multimodal model?

Jaimboh commented 2 weeks ago

To configure a multimodal model in your JSON file, you would need to include additional parameters specific to the multimodal capabilities of the model. Here is an example of how you might configure a multimodal model:

{
  "host": "0.0.0.0",
  "port": 8000,
  "models": [
    {
      "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf",
      "model_alias": "mistral",
      "chat_format": "chatml",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048
    },
    {
      "model": "models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf",
      "model_alias": "mixtral",
      "chat_format": "chatml",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048
    },
    {
      "model": "models/mistral-7b-instruct-v0.1.Q4_0.gguf",
      "model_alias": "mistral-function-calling",
      "chat_format": "functionary",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048
    },
    {
      "model": "models/multimodal-model-v0.1.Q4_0.gguf",
      "model_alias": "multimodal-model",
      "chat_format": "multimodal",
      "n_gpu_layers": -1,
      "offload_kqv": true,
      "n_threads": 12,
      "n_batch": 512,
      "n_ctx": 2048,
      "multimodal_params": {
        "image_size": [224, 224],
        "image_channels": 3,
        "text_max_length": 512,
        "vision_model": "models/vision-model-v0.1.gguf",
        "text_model": "models/text-model-v0.1.gguf"
      }
    }
  ]
}

In this example, the multimodal model configuration includes:

You can adjust these parameters based on the specific requirements and capabilities of your multimodal model.