Open tleyden opened 9 months ago
I'm running the backend with:
sh -c ./llava-v1.5-7b-q4.llamafile -ngl 9999
and I'm able to use the web ui to upload images and send chat queries. However when I try to use the JSON API from code, I am getting a 500 error:
"500 Internal Server Error\n[json.exception.type_error.302] type must be string, but is array"
Here is the JSON request I'm sending:
{ "model":"LLaMA_CPP", "messages":[ { "role":"user", "content":[ { "type":"text", "text":"What is this an image of?" }, { "type":"image_url", "image_url":{ "url":"data:image/jpeg;base64,iVBORwAA <snip ... long 2.2 MB base 64 image> ElFTkSuQmCC" } } ] } ], "max_tokens":4096 }
Which has a similar structure to the OpenAI GPT Vision example.
Is there an example of how to format the request when passing both text and image?
I couldn't find an examples in the docs of doing this particular type of inference via the JSON API. Any hints would be greatly appreciated!
This looks like where it's expecting a string rather than an array in the content field:
content
https://github.com/Mozilla-Ocho/llamafile/blob/dfd333589abd55574ea2d2165aa18e3658045e80/llama.cpp/server/utils.h#L180
There's no support for this yet but we can add it.
I'm running the backend with:
and I'm able to use the web ui to upload images and send chat queries. However when I try to use the JSON API from code, I am getting a 500 error:
Here is the JSON request I'm sending:
Which has a similar structure to the OpenAI GPT Vision example.
Is there an example of how to format the request when passing both text and image?
I couldn't find an examples in the docs of doing this particular type of inference via the JSON API. Any hints would be greatly appreciated!