Open saket424 opened 2 months ago
localai supports multimodal chat completions with gpt-4-vision-preview . can i try baibot with gpt-4-vision-preview instead of gpt-4 ?
- id: localai
provider: localai
config:
base_url: http://172.17.0.1:8080/v1
api_key: null
text_generation:
model_id: gpt-4-vision-preview
prompt: You are a brief, but helpful bot.
temperature: 1.0
max_response_tokens: 16384
max_context_tokens: 128000
name: gpt-4-vision-preview
roles:
user: "USER:"
assistant: "ASSISTANT:"
system: "SYSTEM:"
mmproj: llava-v1.6-7b-mmproj-f16.gguf
parameters:
model: llava-v1.6-mistral-7b.Q5_K_M.gguf
temperature: 0.2
top_k: 40
top_p: 0.95
seed: -1
template:
chat: |
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
{{.Input}}
ASSISTANT:
download_files:
- filename: llava-v1.6-mistral-7b.Q5_K_M.gguf
uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf
- filename: llava-v1.6-7b-mmproj-f16.gguf
uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf
usage: |
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4-vision-preview",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
curl http://172.17.0.1:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4-vision-preview", "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
{"created":1726522282,"object":"chat.completion","id":"3a66a0dd-9899-49df-93c4-a2d36309642e","model":"gpt-4-vision-preview","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"The image shows a wooden pathway leading through a field of tall grass. The pathway appears to be a simple, unpaved trail, possibly in a rural or natural setting. The sky is clear and blue, suggesting a sunny day. There are no visible landmarks or distinctive features in the background, which gives the impression of a peaceful, open landscape. \u003c/s\u003e"}}],"usage":{"prompt_tokens":1,"completion_tokens":76,"total_tokens":77}}
gpt-4-vision-preview does not appear to be supported by baibot -- only gpt-4 for the moment
~/baibot/src/agent/provider/localai$ cat mod.rs
// LocalAI is based on OpenAI (async-openai), because it seems to be fully compatible.
// Moreover, openai_api_rust does not support speech-to-text, so if we wish to use this feature
// we need to stick to async-openai.
use super::openai_compat::Config;
pub fn default_config() -> Config {
let mut config = Config {
base_url: "http://my-localai-self-hosted-service:8080/v1".to_owned(),
..Default::default()
};
if let Some(ref mut config) = config.text_generation.as_mut() {
config.model_id = "gpt-4".to_owned();
config.max_context_tokens = 128_000;
config.max_response_tokens = 4096;
}
if let Some(ref mut config) = config.text_to_speech.as_mut() {
config.model_id = "tts-1".to_owned();
}
if let Some(ref mut config) = config.speech_to_text.as_mut() {
config.model_id = "whisper-1".to_owned();
}
if let Some(ref mut config) = config.image_generation.as_mut() {
config.model_id = "stablediffusion".to_owned();
}
config
}
This is a valid feature request.
baibot currently ignores all images sent by you. It doesn't support feeding them to a model yet.
To address your previous comment:
gpt-4-vision-preview does not appear to be supported by baibot -- only gpt-4 for the moment
You're pasting an excerpt from the code which defines the default configuration for models created on the localai
provider.
This configuration inherits from the "OpenAI compatible" provider and customizes the models to some sane defaults for the LocalAI provider.
The fact that gpt-4
is hardcoded in the default configuration does not mean you can't change it. When creating a new agent dynamically (e.g. !bai agent create-room-local localai my-new-localai-agent
), you will be shown the default configuration (which specifies the gpt-4
model), but you can change it however you'd like. You can also define the agent statically (in your YAML configuration).
Perhaps specifying a gpt-4-vision-preview
model would make LocalAI route your queries to a different agent.
Regardless, baibot cannot send images to the model, so what you're trying to do cannot be done yet.
For completeness, it should be noted that for the actual OpenAI API (recommended to be used via the openai
provider), gpt-4-vision-preview
is no longer a valid model.
If you try to use it, you get an error:
invalid_request_error: The model
gpt-4-vision-preview
has been deprecated, learn more here: https://platform.openai.com/docs/deprecations (code: model_not_found)
Here's the relevant part:
On June 6th, 2024, we notified developers using gpt-4-32k and gpt-4-vision-preview of their upcoming deprecations in one year and six months respectively. As of June 17, 2024, only existing users of these models will be able to continue using them.
Using gpt-4o
is the new equivalent to using gpt-4-vision-preview
.
Thanks @spantaleev . In preparation for this new feature request for baibot. I will open an issue with localAI to let them know that gpt-4-vision-preview is deprecated and to instead name it gpt-4o in compliance with OpenAI API compatibility. This should get mapped to the llava-1.6-mistral model that the stock docker cuda12 localAI v2.20.1 image comes pre installed with.
References to gpt-4-vision-preview in https://github.com/mudler/LocalAI/blob/master/aio/gpu-8g/vision.yaml and
https://github.com/mudler/LocalAI/blob/master/aio/cpu/vision.yaml and
https://github.com/mudler/LocalAI/blob/master/aio/intel/vision.yaml
need to be changed to gpt-4o as you point out
I opened this LocalAI issue https://github.com/mudler/LocalAI/issues/3596
@spantaleev Any progress on this ? I would love for baibot to weigh in when an image and associated prompt is uploaded. This should be relatively straightforward to support as this is an extended multimodal use of the existing text chat completion api
I see text to image as a supported feature. How about image to text. There are quite a few capable multimodal self-host models these days such as moondream2 and minicpm2.6 that are supported in ollama and similar.
Is that functionality implicitly supported!