Vision support across OpenAI-Style-Providers (e.g. Azure, Custon OpenAI endpoint)

carlrobertoh / CodeGPT

JetBrains extension providing access to state-of-the-art LLMs, such as GPT-4, Claude 3, Code Llama, and others, all for free

https://codegpt.ee

Apache License 2.0

979 stars 204 forks source link

Vision support across OpenAI-Style-Providers (e.g. Azure, Custon OpenAI endpoint) #649

Closed moritzfl closed 2 days ago

moritzfl commented 1 month ago

Describe the need of your request

Models with vision support are becoming more common but as of now, you can only use vision with OpenAI and Claude in CodeGPT.

It would be nice to enable Vision support for other models as well.

Proposed solution

When configuring models for other providers, add a checkbox for enabling vision support on the configured model.

Not all models support vision and therefore it makes sense to configure the support directly with the model.

Alternative: just enable vision support everywhere, where llm-client would support API-requests that support vision. The REST-Endpoint should respond with a suitable error/warning message.

Additional context

I am willing to work on this issue if it is clear how you want it to be configurable in the settings/UI.

carlrobertoh commented 1 month ago

If we're just talking about vision support through OpenAI API-compatible providers, then no additional UI changes are required. The image action must be allowed for Azure and Custom OpenAI providers (ServiceType.java:L68), and then the actual API request must be configured accordingly (this might already come out-of-the-box).

moritzfl commented 1 month ago

Good to know - I guess it would have helped to inspect the code.

I tried GPT-4-Turbo before with Azure as well as a custom OpenAI provider (openrouter) and that seems to be the one model that is missing from the list of supported models ("GPT-4o, GPT-4o mini, and GPT-4 Turbo have vision capabilities" - see https://platform.openai.com/docs/guides/vision).

~~If I am not mistaken, Azure is only covered by the "default" part of the switch-case and thus is not supported for vision at all in CodeGPT.~~

EDIT: Nevermind - you were just giving hints towards the implementation while I initially thought you meant that a part of this issue is already covered ... thanks 👍

n0isy commented 3 weeks ago

@carlrobertoh Can you add an option to force switch ON, switch OFF, or AUTO current code in (ServiceType.java:L68) ?

I use custom enterprise proxy (endpoint + auth headers) to openai with image support.

moritzfl commented 3 weeks ago

You beat me to it - thanks! My slack day is next week :)

Given that you can name models on Azure however you like, do you think it makes sense to just enable image support there globally in a similar fashion?

An alternative would be to enforce a naming scheme that is similar to OpenAI but given that Azure will mostly be used in companies, it might be a burden to convince everyone to stick to the naming scheme.

If you do not have an Azure Account, I can also test that for you (but then again - I did test llm-client for exactly that when implementing vision support there https://github.com/carlrobertoh/llm-client/pull/18). We also have a Zulip chatbot based on llm-client that is working fine with the Azure account.

carlrobertoh commented 3 weeks ago

Yes, I think we should enable image support for all models provided on Azure, since it already comes with an OpenAI-compatible API and the existing functionality should be available out of the box.

An alternative would be to enforce a naming scheme that is similar to OpenAI but given that Azure will mostly be used in companies, it might be a burden to convince everyone to stick to the naming scheme.

I don't believe the naming scheme is worth it. I haven't used Azure myself, but the models aren't global, right? Even if the underlying model might be the same, the name of the model (deployment name) might not be.

However, I'm unsure what will happen if the model doesn't support images. Will it crash or simply ignore the image input? Then again, I didn't test this behaviour with the Custom OpenAI provider either, although this can be different from provider to provider.

moritzfl commented 2 weeks ago

Yes, you can basically name models however you like. It seems natural to stick to the model names that OpenAI uses and we do that at our company - but that might not be the case everywhere.

And one use-case for that might even be that you configure a resource once and migrate (i.e. delete the old resource, add the new model with the same name as before in your Azure account) to newer and newer models without any configuration change across the clients that use your API access.