Code completion for "Custom OpenAI Service"

boswelja commented 7 months ago

Describe the need of your request

I am hosting a model on my machine via Ollama, but it seems like a lot of features don't work/don't exist for "Custom OpenAI Service"? Specifically, I'd like to see code completion as an option for this service type.

Proposed solution

No response

Additional context

I'm open to contributing, but it'd be nice to know about any blockers or potential issues if applicable.

reneleonhardt commented 7 months ago

@boswelja You're right, the README isn't correct anymore since 2.6.0: https://github.com/carlrobertoh/CodeGPT#code-completions state Currently supported only on GPT-3.5 and locally-hosted models but https://github.com/carlrobertoh/CodeGPT/commit/f0172722c75ae50d2ea895f68cbef0c90bbbcc7f#diff-dd1eb3c6e1b3ba42852ce667a754aeb087bd3d18af74e79ba8fec072bea47793R59 changed it to OpenAI (first) and LLaMA C/C++ (last) service only (if their respective completions setting is enabled).

So I would suggest to mimic/duplicate those LLaMA changes to the Custom service:

Add codeCompletionsEnabled and codeCompletionMaxTokens to CustomServiceSettingsState
Add codeCompletionConfigurationForm to CustomServiceForm
Add CUSTOM_OPENAI to CodeCompletionFeatureToggleActions
Add buildCustomServiceRequest() to CodeCompletionRequestFactory
Add CUSTOM_OPENAI to CodeGPTInlineCompletionProvider.isEnabled()
Add useCustomOpenAIService() to ShortcutsTestMixin
Duplicate testFetchCodeCompletionLlama() in CodeCompletionServiceTest with useCustomOpenAIService()

boswelja commented 7 months ago

@reneleonhardt Thanks for the detailed response! I've been poking around the codebase and realized we don't actually have a client (or API endpoint) for non-chat completions for Custom OpenAI models. There's a couple of options I can see to get around this

Add a generic configuration for completions endpoint (same as we have now for chat completions aka "URL")
Start moving these custom services into their own higher-level services. It's not uncommon for these services to have their own API (or an OpenAI-like API) for these, so we could save users from having to enter URLs manually Both of these would prompt changes in https://github.com/carlrobertoh/llm-client, which I'll try to contribute regardless. Please let me know which one you'd prefer. Personally, I'm leaning towards the second option.

reneleonhardt commented 7 months ago

@boswelja Good idea, the second point is more "realistic". All other services are top-level services, but below Custom OpenAI there really is another layer of services (and their respective models), you are right! It would be awesome if you could lay the groundwork in llm-client and contribute Ollama as a first sub-service, I'm using it exclusively too 😉 I would be happy to implement the new OllamaService Completion I suggested above in CodeGPT once llm-client has been released 😄

carlrobertoh commented 7 months ago

The reason for the custom configuration was to avoid configuring and maintaining all these not-so-well-known providers. In my opinion, most of the tools that support running local models are just a 'fancy' user interfaces running on top of llama.cpp or a similar architecture.

The idea was to provide only a handful of top-level official API providers (OpenAI, Google, Anthropic, etc.), a single configuration for running local models, and the rest would fall into "custom service" category.

Perhaps we can make the infilling work with the existing logic. The only thing that needs to be defined is the prompt template, since it can be different per model.

boswelja commented 7 months ago

The reason for the custom configuration was to avoid configuring and maintaining all these not-so-well-known providers.

That's a valid concern, a lot of these sort of self-hosted services make changes at a pretty fast pace. As far as I can tell, that's not the case for ollama though (the last tangible change to the API was 2 months ago, where they added an optional keep_alive parameter).

and the rest would fall into "custom service" category

We could keep this similar structure, but I think the way it's implemented now isn't feasible in the long-term. I haven't looked into all of the services that we have templates for, but if we continue this approach we will end up handling more than we need to because we limit ourselves to a single endpoint. It's definitely a good idea to keep the idea of "Custom OpenAI Service" though, it seems like a common thing, for self-hosted services to either only provide that as an API, or provide an API that adds no value over that.

Perhaps we can make the infilling work with the existing logic. The only thing that needs to be defined is the prompt template, since it can be different per model.

Sure, I could work on that! As far as I can tell though, the way completions are implemented now relies on callbacks defined in https://github.com/carlrobertoh/llm-client right? We will have to make some changes to either abstract that away, or build something that can make use of that.

I still feel that, at least mid-long term, moving the more "stable" custom services to their own top-level service type is a good move. Short term, I don't mind either way, I just want code completion on steroids 😆

sisve commented 7 months ago

I started looking into this to figure out how much was left, and ended up with https://github.com/sisve/CodeGPT/tree/custom-service-code-completion which is based on @boswelja 's latest code.

I have reused the OpenAIClient, which assumes a host instead of an url, and have to do some hack-ish url-mangling as a result. This would probably be better with a CustomServiceClient, which I believe that @boswelja refers to with "moving ... custom services to their own top-level service type". However, a quick solution would be to expose two urls in the configuration form for the custom service, one for chat completions and one for text completions.

This can of course be taken further, the current configuration page related to headers and body is focused on the chat completions, so we could create another configuration page to allow customizing these for text completions too.

I'm partial to adding another url to the configuration page, and let the other values default, as we do with OpenAI (see CodeCompletionrRequestFactory.kt).

Anyone have thoughts on this approach, adding more configuration entries for the custom service?

boswelja commented 7 months ago

Thanks for helping look in to this!

This would probably be better with a CustomServiceClient, which I believe that @boswelja refers to with "moving ... custom services to their own top-level service type"

Not quite, I was proposing we explicitly move services like Ollama that have a well-defined API out of a "Preset template" under "Custom OpenAI Service" and into their own service type, so Ollama would appear alongside "Custom OpenAI Service" in that dropdown. I do think we need a proper custom service client though (there's already one for Ollama) :stuck_out_tongue:

However, a quick solution would be to expose two urls in the configuration form for the custom service, one for chat completions and one for text completions.

Correct me if I'm wrong, but a lot of these custom services only expose an OpenAI-like chat completions endpoint, if any OpenAI-like endpoint at all. It'd definitely be a nice-to-have just in case, but I don't think it's necessary to have for this.

Anyone have thoughts on this approach, adding more configuration entries for the custom service?

I just cloned your fork locally and tested it, it doesn't work for anything that doesn't provide that OpenAI-like completions endpoint. If this is the approach we want to take, we need to be able to point that to the right URL, as well as configuring the prompt template as @carlrobertoh mentioned above.

On the topic, is this the approach we want to go with? I think adding these extra configurations and options is definitely valuable for custom OpenAI-like services, but I think we've ended up with 2 separate improvements we're talking about here :sweat_smile:

carlrobertoh commented 7 months ago

I do agree with @boswelja that limiting ourselves to a single endpoint isn't scalable in the long run, especially if we want to start consuming other specific endpoints as well. However, since the code completions don't require anything other than a different prompt, then I would like to see it for Custom OpenAI-compatible services first.

When it comes to these "top-level services", could we hold on to this a bit longer, until there's an actual need for it? I wanna make a few design improvements on how these services are built and managed.

@sisve I haven't run the branch locally yet, but from a quick testing perspective, I noticed that you're testing llama.cpp completions instead of the new custom service ones.

sisve commented 7 months ago

@carlrobertoh The tests you're seeing is from @boswelja commits, he is the responsible one writing tests, and has written most of the code. I added some final touches on my small use-case, mostly because I urge for the steroid backed completions.

I'm very narrow in my world-view; I am testing against an internal proxy that accepts incoming OpenAI-urls (/v1/completions and /v1/chat/completions), parses requests based on the OpenAI docs, and forwards to deployments on Azure.

I think we need two separate urls; some models are only available on the older /v1/completions endpoint (gpt-3.5-turbo-instruct specifically). Ref: https://platform.openai.com/docs/models/model-endpoint-compatibility

boswelja commented 7 months ago

Guilty, I have no idea what I'm doing in those tests yet 😅

It sounds like we've got a pretty solid game plan?

Start by implementing the bare minimum for custom service code completions
Add extra configurations, like a custom completions URL
As separate work, start breaking up custom services as their own service types (probably want a separate issue for the services this is feasible for?)

What do y'all think?

sisve commented 7 months ago

That sounds reasonable to me, and we can start getting the masses hooked.

I think that we are very close to the bare minimum. I think it's only a few steps left.

Figure out what's going on in those tests.
Figure out if the Code Completions config should be above the "general" config, as in OpenAI, or below it, as currently for Custom OpenAI
Add a configuration entry for the completion endpoint, and remove the guessing from getCustomOpenAIClient()

sisve commented 7 months ago

I believe that https://github.com/sisve/CodeGPT/tree/custom-service-code-completion has everything needed now... except for figuring out those tests. I haven't thought about them at all.

One commit splits up the configuration into two parts, one for /v1/chat/completions and one for /v1/completions. I believe that I've kept backward compatibility with annotations to read the previous serialized field names.

I've also extracted all requests generation from CompletionRequestService into a CustomServiceRequestBuilder. I was initially trying to do something like C#'s partial classes, but gave up and just created a separate static class.

The pushed code (at the time of writing commit 1b1a7c9), can auto-complete my printFizzBuzz() method one line at a time. I believe that this is the culmination of everything AI. I send a newline as a stop sequence to save tokens, since we're currently limiting ourselves to one line at a time in CodeCompletionParser.

carlrobertoh commented 7 months ago

@sisve Could you raise a PR please? I can take a look at those tests and possibly make some other minor changes as well.

sisve commented 7 months ago

I've created the PR #476 now, by popular demand.

carlrobertoh commented 7 months ago

I did some investigation around this, and I have decided not to support this feature.

The reason is that each provider handles text completions in their own way. Even if you can fully configure the request format, the response structure might still differ and not align with other providers.

It seems that the best way to solve this is to start extracting these predefined templates into standalone services and start rolling out the code completions separately as @boswelja mentioned.

boswelja commented 7 months ago

I did some investigation around this, and I have decided not to support this feature.

Does this also mean any PRs aimed at this will be rejected/not reviewed?

It seems that the best way to solve this is to start extracting these predefined templates into standalone services and start rolling out the code completions separately as @boswelja mentioned.

@carlrobertoh Before I start writing a bunch of code, I just want to confirm - do we want API clients created/updated in https://github.com/carlrobertoh/llm-client for this?

sisve commented 7 months ago

The reason is that each provider handles text completions in their own way. Even if you can fully configure the request format, the response structure might still differ and not align with other providers.

But if they do that, are they actually classified as a Custom OpenAI service? Clearly they must provide OpenAI-compatible endpoints and request formats?

If I got a proxy that forwards calls to an OpenAI service (and adds proper api keys), what's the way forward to use CodeGPT for code completions? This is an internal service not accessible by the public, and wouldn't be appropriate to provide as a separate service or template.

carlrobertoh commented 7 months ago

@boswelja

Does this also mean any PRs aimed at this will be rejected/not reviewed?

Most likely, I will not review any service configuration changes made in the CodeGPT repository regarding that matter.

Before I start writing a bunch of code, I just want to confirm - do we want API clients created/updated in https://github.com/carlrobertoh/llm-client for this?

Yes, all the API-related data should go into llm-client, which will be used by CodeGPT.

@sisve

But if they do that, are they actually classified as a Custom OpenAI service? Clearly they must provide OpenAI-compatible endpoints and request formats?

Not really, the only thing they have in common is the OpenAI-compatible chat completions API, and that's the whole reason for this Custom Service. It means users don't have to wait for CodeGPT updates to configure a new provider for chat completions.

If I got a proxy that forwards calls to an OpenAI service (and adds proper api keys), what's the way forward to use CodeGPT for code completions? This is an internal service not accessible by the public, and wouldn't be appropriate to provide as a separate service or template.

That's a bit trickier. In the recent versions, we had an option to configure the base host, but that was removed once I released this custom service configuration. It looks like I need to bring the host configuration back.

sisve commented 7 months ago

Would it change your mind if we redesigned the configuration page, and move away from the focus on endpoints and change the headers to mention functionality instead? So instead for asking for details about /v1/chat/completions and /v1/completions, we ask for information about Chat completions and Code completions separately?

I think we would want separate models for chat and code completions. Particularly the gpt-3.5-turbo-instruct model is only supported by /v1/completions (ref: https://platform.openai.com/docs/models/model-endpoint-compatibility) and the /v1/completions api is the only one supporting suffixes (ref: https://platform.openai.com/docs/api-reference/completions/create#completions-create-suffix).

What alternate track forward are you thinking about? Providing a code completion related prompt and use /v1/chat/completions?

carlrobertoh commented 7 months ago

@sisve Yep, that was the idea. Instead of dealing with this configuration mess, all the services will have a lightweight configuration containing only the required fields to fill, similar to how other top-level services are configured now.

sisve commented 7 months ago

I agree that the other top-level services will be easier; for example, the Azure one would just need to ask for two deployments (read: models) for chat vs code completions. But I think that the expectation of Custom OpenAI is to be really configurable with lots of customization, that the other top-level services don't need.

What if we modified the code-completion part of the config (currently /v1/completion) so that it can handle being pointed to /v1/chat/completions, so any provider not supporting that endpoint can use the one they [presumably] do have? I think it's just a matter of detecting the response format and how the result is returned.

Is it the different configuration possibilities that concerns you, or the user interface? We could rename the headers to focus on purpose, default code completions details to same details as chat completions, and hide everything related to the code completion endpoint behind a collapsed/expandable panel. But the configuration possibilities would still be there for those that want to use gpt-3.5-turbo-instruct.

carlrobertoh commented 7 months ago

I think it's just a matter of detecting the response format and how the result is returned.

I think that's the trickiest part. Some of these providers have different response formats, or don't even support text completions at all. I could try to hard-code and map the correct response type to each of these templates and then choose accordingly when receiving the actual API response. But then again, wouldn't it be easier and more maintainable to have separate services, with their own APIs instead?

Is it the different configuration possibilities that concerns you, or the user interface?

Both, to be honest. From the UX perspective, users are already having trouble finding their way around configuring the Ollama service. Making it a top-level service and eliminating the need for configuring request/response configurations (for both chat and code completions) would definitely make things smoother. Design-wise, I just can't find this 'hard-coded enum template' solution scalable enough, and I believe they deserve to have a standalone llm-client API interface.

I'm in a bit of a dilemma here... On one hand, I want to extract these out and make the configuration more user-friendly, and on the other, I would still like to see this high level of customization.

I guess I'll still give this 'hard-coded response structure type mapping' thing a chance and see if it works out. However, I believe these response types should come from the llm-client, which already indicates that they deserve to have their own interface.

UI-wise I was thinking something like this:

Still lacks a dropdown for choosing the correct FIM prompt template.

carlrobertoh commented 7 months ago

I guess we could still make it completely OpenAI-compatible and only depend on a single response structure. However, with this, you can't use Ollama for code completions and probably some other providers as well.

sisve commented 7 months ago

I was unclear when I mentioned guessing the response type; I was only thinking of detecting if it is a response coming from /v1/chat/completions or /v1/completions.

Regarding Ollama they seem to lack the OpenAI support required to be a Custom OpenAI template. It seems reasonable to move them into a separate service, which would probably make it easier to configure it for users.

I've updated the branch with the new tabbed interface. I'm still thinking about how the FIM prompting would work since it's not supported by any OpenAI-hosted model that I know of. I sounds reasonable to "just" provide another body template variable that maps to the value of a nearby dropdown.

Regarding Ollama, point 2, someone cough need to fix https://github.com/ollama/ollama/issues/3027. It seems to be a middleware located at https://github.com/ollama/ollama/blob/main/openai/openai.go, hooked up at https://github.com/ollama/ollama/blob/8d1995c625e7f2ed2ff98eb099e1bd8d7e6e133e/server/routes.go#L1079.

carlrobertoh commented 7 months ago

I've pushed some changes. Feel free to tweak, add, or fix anything you see fit. It seems like quite a lot of changes, but I want the new stuff to be written in Kotlin.

boswelja commented 7 months ago

So I've got some good news and some bad news

Good news is, I have a separate ollama service locally that (seems to) work, including code completions! It is currently blocked by https://github.com/carlrobertoh/llm-client/pull/27, and I want to get a bit of polish done on it first, but so far so good.

Bad news is, Ollama doesn't have an abstraction around code completions, so we need to know about model tokens for that specific task. I'm thinking short-term we can borrow the local llama configuration for that, and long-term I'll raise an enhancement for ollama to provide an API or something for this.

Mildly related, I might be able to sneak in file attachment for the ollama service 👀

boswelja commented 6 months ago

Oh cool I see completion for custom services was merged, I'll just yoink the FIM template selector from there for now 👀

carlrobertoh commented 6 months ago

@boswelja We can use the predefined infill prompt templates, just as we do for llama.cpp service.

carlrobertoh / CodeGPT