Closed gelim closed 7 months ago
If I understand correctly, that is more of a feature that is not implemented within server.cpp than a bug in itself.
Here is the OpenAI API documentation for reference: https://platform.openai.com/docs/api-reference/chat/create
Ok after digging a bit, I see the code in examples/server/server.cpp
and examples/server/public/index.html
that is definitely not OpenAI REST API compatible.
Format info from README.md
I monkey patched api_like_OAI.py This is highly untested and does not handle several pictures being sent during the chat session.
Main idea is to catch the messages 'content' typed as list, extract the 'image_url' b64 data, convert it to jpeg (forcing that as my frontend sends webp), create the root key 'inage_data' with data + id. Update user message in prompt with the ref to img id.
To be done: add multi images support.
I am experiencing the same thing. I tried using this code and could not get it to work:
import base64
import requests
CONTEXT = "You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language. Follow the instructions carefully and explain your answers in detail.### Human: Hi!### Assistant: Hi there! How can I help you today?\n"
with open(image.jpg', 'rb') as f:
img_str = base64.b64encode(f.read()).decode('utf-8')
data = {
"messages": [
{
"role": "user",
"image_url": f"data:image/jpeg;base64,{img_str}"
},
{
"role": "user",
"content": "what is in this image?"
}
]
}
response = requests.post('http://<addr>:<port>/v1/chat/completions', json
```=data)
Yes you need to do the json adaptation yourself. I can put my crappy code later for people to improve it.
Yes you need to do the json adaptation yourself. I can put my crappy code later for people to improve it.
Would you be kind enough to drop your code in a gist or give an example? Thank you.
diff --git a/examples/server/api_like_OAI.py b/examples/server/api_like_OAI.py
index 607fe49..6638081 100755
--- a/examples/server/api_like_OAI.py
+++ b/examples/server/api_like_OAI.py
@@ -39,20 +39,51 @@ def convert_chat(messages):
user_n = args.user_name
ai_n = args.ai_name
stop = args.stop
-
+ multimodal = str()
prompt = "" + args.chat_prompt + stop
for line in messages:
if (line["role"] == "system"):
prompt += f"{system_n}{line['content']}{stop}"
if (line["role"] == "user"):
- prompt += f"{user_n}{line['content']}{stop}"
+ # multimodal heuristic
+ if isinstance(line['content'], list):
+ for cont in line['content']:
+ multimodal="[img-10]"
+ if cont['type'] == 'text':
+ prompt += f"{user_n}{multimodal}{cont['text']}{stop}"
+ else: prompt += f"{user_n}{multimodal}{line['content']}{stop}"
if (line["role"] == "assistant"):
prompt += f"{ai_n}{line['content']}{stop}"
prompt += ai_n.rstrip()
return prompt
+# from any image format in base64 to JPEG in base64
+# using Pillow lib
+def multimodal_convert_pic(image_b64):
+ from base64 import b64decode,b64encode
+ from io import BytesIO
+ from PIL import Image
+
+ webp_bytes = b64decode(image_b64)
+ im = Image.open(BytesIO(webp_bytes))
+ if im.mode != 'RGB': im = im.convert('RGB')
+ jpg_data = BytesIO()
+ im.save(jpg_data, 'JPEG')
+ jpg_data.seek(0)
+ return b64encode(jpg_data.read()).decode()
+
+def multimodal_extract_image(body):
+ for line in body['messages']:
+ if not line['role'] == 'user': continue
+ for cont in line['content']:
+ if cont['type'] == 'image_url':
+ url = cont['image_url']['url']
+ start = url.find(',') + 1
+ return multimodal_convert_pic(url[start:])
+ return False
+
def make_postData(body, chat=False, stream=False):
postData = {}
if (chat):
@@ -81,6 +112,9 @@ def make_postData(body, chat=False, stream=False):
postData["stream"] = stream
postData["cache_prompt"] = True
postData["slot_id"] = slot_id
+ # multimodal detection
+ pic_data = multimodal_extract_image(body)
+ if pic_data: postData["image_data"] = [{"data": pic_data, "id": 10}]
return postData
def make_resData(data, chat=False, promptToken=[]):
Launching the proxy with:
./api_like_OAI.py --llama-api http://llamacpp_listening_ip:llamacpp_port --host proxy_listening_ip --port proxy_port
Forwarded message to [llamacpp_listening_ip:llamacpp_port] will look like this:
POST /completion HTTP/1.1
Host: 172.17.1.1:8480
User-Agent: python-requests/2.31.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 5381
{"prompt": "A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.</s>USER: [img-10]describe this picture</s>ASSISTANT:", "temperature": 1, "top_p": 1, "n_predict": 4000, "presence_penalty": 0, "frequency_penalty": 0, "stop": ["</s>"], "n_keep": -1, "stream": true, "cache_prompt": true, "slot_id": -1, "image_data": [{"data": "/9j/4AA[***STRIPPED BASE64 JPEG****]RQB//2Q==", "id": 10}]}
and you will point your OpenAI protocol speaking frontend to baseUrl = http://proxy_listening_ip:proxy_port/v1
this is now getting more interesting with Llava 1.6 being released and results much usable than 1.5 on their demo... Waiting for llama.cpp to update (#5267) as now loading the GGUFs result to same quality as with 1.5.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Hello while testing Llava-13B with server implementation I got a 500 error related to the
content
being a list of dicts and not a simple string.{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Hi there! How can I help you today?","role":"assistant"}}],[...]
[json.exception.type_error.302] type must be string, but is array
:this is to demonstrate the issue when using an OpenAI REST aware frontend that is pushing text with pic inside the
content
key like this: