Implement Ollama as a high-level service

boswelja commented 4 months ago

Implementing Ollama as a high-level service type. This has a few advantages:

Unlocks the full use of Ollama APIs that aren't necessarily OpenAI-compatible
Significantly simplify plugin setup for Ollama users
Significantly easier to expand supported features (we don't have to deal with service type -> template -> compat API checks)

Right now, the above benefits have translated into:

Support for inline code completions (limited right now, pending https://github.com/ollama/ollama/issues/3869 for full support)
Support for multimodality
Model selector surfaced right next to host input
Dedicated Ollama option for chat window service dropdown

Currently blocked by:

[x] ~~https://github.com/ollama/ollama/pull/3907~~ Delays with implementation on ollama side, let's stick with what we've got for now.
[x] https://github.com/carlrobertoh/CodeGPT/pull/520
[x] https://github.com/carlrobertoh/CodeGPT/pull/521
[ ] Preventing multimodal inputs where appropriate

Screenshots:

boswelja commented 4 months ago

Please be aware I have no idea how the IDE plugin API works, there's bound to be issues 😆

PhilKes commented 4 months ago

Maybe I missed some discussion here, but I thought @carlrobertoh didnt want to support Ollama as a high-level service? We had that discussion when I already implemented exactly this using the Ollama API in https://github.com/carlrobertoh/CodeGPT/pull/361. If that opinion changed, I'm all for it of course 😁

boswelja commented 4 months ago

We did a little negotiating 😛 https://github.com/carlrobertoh/CodeGPT/issues/441#issuecomment-2066030409

carlrobertoh commented 4 months ago

Since this is a popular request and Ollama doesn't support an API for OpenAI-compatible text completions, I've decided to make an exception. However, I'd still like to keep the others as they are and provide better documentation on how to configure them. 🙂

PhilKes commented 4 months ago

Since this is a popular request and Ollama doesn't support an API for OpenAI-compatible text completions, I've decided to make an exception. However, I'd still like to keep the others as they are and provide better documentation on how to configure them. 🙂

Good timing, I was actually working on adding /v1/completions support to Ollama 😂

But instead I now opened a PR for supporting llama.cpp's /infill API for FIM/code-completions: https://github.com/ollama/ollama/pull/3907 Which would resolve @boswelja https://github.com/ollama/ollama/issues/3869

boswelja commented 4 months ago

Wow that made progress way faster than I expected, thanks @PhilKes! I'll hold off on this for a bit and see if we can get that API in for the first release, which would effectively solve code completions

carlrobertoh commented 4 months ago

Nice! In the meantime, we could switch the llama.cpp completions to the /infill API as well

linpan commented 4 months ago

Ollama as a high-level service support /v1/completions. keep on,

boswelja commented 4 months ago

Preventing multimodal inputs

re. this, it looks like Ollama APIs handle this relatively gracefully

Is this an acceptable solution, at least for now? Users can still attach files, but they are just ignored. In the future, I think we can check the model "families" that Ollama gives us to see if it contains "clip", but I'm not sure if that's a silver bullet just yet

artem-zinnatullin commented 4 months ago

Hey @boswelja, just wanted to confirm that I got your PR at last state https://github.com/carlrobertoh/CodeGPT/pull/510/commits/dc322165af7c9a14b9a0003cb3ab1145b9de7020 working with master branch of https://github.com/carlrobertoh/llm-client pushed to mavenLocal(), great work!

Few thoughts:

Really cool that you're pulling available models into a dropdown in settings, very easy to use!
Settings UI should note why certain models can't be used for Code Completion, it also doesn't seem to properly render enable/disable state when switching models
We need some llama3 specific FIM mappings it seems, can't get it to produce meaningful code completions 😭
Your code is really clean, pleasure to read, plugin itself could use less copy-paste between settings related logic and actual business logic 😅

Hope this motivates you to pushing this PR further! Happy to test new changes and work out Ollama support

boswelja commented 4 months ago

Thanks @artem-zinnatullin! I'm aware of potential issues with toggling code completion, I was beaten to the punch by the custom OpenAI service completion, which implements this slightly different, so I'm halfway through refactoring to match that.

While we wait for https://github.com/ollama/ollama/pull/3907, I'll try split this into smaller PRs so that it's less to review all at once :)

linpan commented 4 months ago

@boswelja

boswelja commented 4 months ago

Yes that's me

PhilKes commented 4 months ago

@boswelja I think we shouldnt rely on /infill API for now. I thought https://github.com/ggerganov/llama.cpp/pull/6689 would enable the FIM prompt templates to be loaded automatically for all models in llama.cpp, but thats not the case as I understand it. Right now llama.cpp only knows how to determine the correct FIM tokens (prefix, suffix, middle) for CodeGemma and CodeLlama. At least that was my experience when I tried to test /infill with CodeQwen (https://github.com/ggerganov/llama.cpp/issues/7102#issuecomment-2095956124).

It would be really nice not having to bother with Infill Prompt Templates in the CodeGPT Plugin itself, but I think the /infill of llama.cpp does not yet offer what we need. But maybe someone else knows more about that than me. I'm still waiting on feedback for https://github.com/ollama/ollama/pull/3907.

But if not, I would actually propose to rollback #513 and also do not rely on it for the Ollama service implementation aswell.

boswelja commented 4 months ago

Fair enough, we can move forward sticking with generate for now. Thanks for the detailed investigation!

carlrobertoh commented 4 months ago

Huh, that's probably the reason why I rolled back the /infill API in the first place, altho I never actually investigated why some of the models weren't working as expected.

@PhilKes Let's revert the last change :)

@boswelja is the PR ready for review? I might push some changes on the fly, or perhaps merge it as is, since I'm planning on integrating another new service, which might cause some merge conflicts.

boswelja commented 4 months ago

I was about to say "no, I've got a couple of smaller PRs that should go in first" but looks like they're merged now!

I'll resolve conflicts and do another once-over, I think the only other thing I wanted input on is https://github.com/carlrobertoh/CodeGPT/pull/510#issuecomment-2081306487

boswelja commented 4 months ago

Current issues:

When refreshing the model list, the model dropdown doesn't update with discovered models
- This is most noticeable when first setting up, after there have been no models but refreshing loads models
- You can still chat with the model, but the dropdown shows the wrong service and model active
Can upload images to models that don't support image inputs (they just ignore it)
- Is this even an issue?

Not really sure how to fix that first one

carlrobertoh commented 4 months ago

When refreshing the model list, the model dropdown doesn't update with discovered models

I made a few minor changes, including fixing the dropdown refresh issues. ~~I also removed the availableModels state since there's no need to maintain any record of available models, as they are always requested via API.~~

Edit: Will revert the removal

Can upload images to models that don't support image inputs (they just ignore it)

Is this even an issue?

I don't think it's an issue at the moment. Let's keep it.

carlrobertoh commented 4 months ago

Everything seems to be working more or less; code completions still need to be improved, but other than that, it seems good. I'll try to provide better documentation on how to set up everything soon as well. Also, if something pops up, then I'll fix it on the fly.

Furthermore, you can expect the feature to be released sometime early next week, hopefully even earlier.

A big thank you to everyone for your help and support! ❤️

carlrobertoh / CodeGPT

Implement Ollama as a high-level service #510