Open Boscop opened 1 month ago
Does it work if you include the "detail" field along with the image URL?
Try changing
"image_url": {
"url": "..."
}
to
"image_url": {
"url": "...",
"detail": "auto"
}
Edit: I think the response you posted got about the deserialization error is from the detail field missing. When I adjusted the request you posted to include that field, the response was successful; but I think it just said that it can't view images. My guess is you need a different kind of model but I don't know anything about this stuff. So probably not what you were looking for sorry ._.
Before we delve into the implementation details, could you share the use case for utilizing image input? What value do you gain by using Tabby, instead of directly interacting with a chat completion endpoint that includes an image API (e.g., the GPT-4 series)?
@wsxiaoys Sure :) My use case is: I want to be use Tabby via the API to generate code not just based on instructions but also attached images such as screenshots of technical spec documents (e.g. hardware devices or screenshots of images from PDFs), (and also documents via embedding/RAG by extracting text from PDFs, technical documentation docs related to the project etc.).
E.g. a lot of PDFs about hardware devices have tables with binary layouts etc. and I often take a screenshot and tell Claude to write code based on it, which works, but then it's always out of the context of the codebase, I want to do it in context locally with Tabby.
Basically I want to extend Tabby's functionality with this, by using it via the API. Maybe I'll even end up writing my own coding assistant, and I want to use Tabby as the backend via the API.
Thanks for making Tabby, it's great :)
I want to build a local assistant on top of Tabby's HTTP API, and this assistant should support image inputs in chat. Like with the OpenAI API: https://platform.openai.com/docs/guides/vision/quickstart
When I tried this example on Tabby's local HTTP API:
I got this error:
So it seems that Tabby doesn't support image inputs in chat completion requests.
Would it be possible to add support for image inputs in the chat completions API? π Either as URL or base64 encoded image. (Like in ollama https://github.com/ollama/ollama/issues/3690)
Please reply with a π if you want this feature.