Add support for OpenAI API-compatible local LLM servers (e.g. Ollama, LMStudio, GPT4All)

AbanteAI / mentat

Mentat - The AI Coding Assistant

https://mentat.ai

Apache License 2.0

2.55k stars 239 forks source link

Add support for OpenAI API-compatible local LLM servers (e.g. Ollama, LMStudio, GPT4All) #327

Closed dvega-flexion closed 11 months ago

jakethekoenig commented 11 months ago

They already are supported. If you specify the OPENAI_API_BASE it should work. See here.

rupurt commented 10 months ago

I've tried getting mentat running with ollama and it seems like there are a couple of pieces missing besides just setting OPENAI_API_BASE.

I also had to set OPENAI_API_KEY=anything.

However it fails to respond to any requests as it can't use any embeddings. The list of embedding models appears hard coded and I can't figure out how to set something like from langchain.embeddings import OllamaEmbeddings

rupurt commented 10 months ago

If I try to use an OpenAI embedding model like gpt-3.5-turbo while having the OPENAI_API_BASE it fails when using ollama and litellm because litellm doesn't have that embedding model and I can't use one of their supported embedding models because it's not in the hardcoded list of mentat.

rupurt commented 10 months ago

Thinking about this some more. I think it can be solved by adding an extra cli option. When you run mentat with an unknown --model ... it asks you to the the --maximum-context .... I will try and figure out how I need to add an option like --maximum-embedding-context ... use that instead of raising an error

biobootloader commented 10 months ago

@rupurt which embedding model are you trying to use?

The auto context feature (embeddings / RAG) is new and we haven't tried embedding models other than text-embedding-ada-002 yet, but we are prioritizing adding support for other models.

You're only getting this error when running with auto-context, right? (i.e. mentat -a?)

rupurt commented 10 months ago

@biobootloader oh, interesting. I was trying to route to the OpenAI API for the embeddings with --embedding-model openai/text-embedding-ada-002. I'm more than happy to help test and add (basic) local model features. That is my current quest to try and bring down OpenAI costs and open up custom models to our current tooling.

I'm also familiar with https://github.com/mudler/LocalAI and looking at the mentat codebase I think that will be a better current fit because you can just override exact model names. e.g. I can map gpt-3.5-turbo -> mistral in LocalAI

jakethekoenig commented 10 months ago

I made a PR https://github.com/AbanteAI/mentat/pull/397 to not use an embeddings model if auto_context is false. There was no reason to use it. I think that should help with the immediate problem though it still leaves open how to handle using different endpoints for different models.

~Thanks for mentioning that you needed to set an API key even when using a free/local endpoint. I'll make a PR to fix that too.~ This turns out to be more difficult than I thought because the OpenAi client throws an exception if you don't give it a key.

rupurt commented 10 months ago

I've made some good progress using LocalAI. I've been using the bert text embeddings models which only supports a context size of 500 which seems like the limiting factor now. I've set --auto-tokens 250 (and lower) but it causes a segfault in the LocalAI backend. It seems like mentant is sending too much context.

I'm going to try and get the Jina embeddings model working in LocalAI which supports an 8k context to see if that works.

biobootloader commented 10 months ago

--auto-tokens 250 doesn't change the size of the features, that's changing how many total tokens will be sent to the model as context. So the embedding model is seeing larger features and that's probably causing the crash

rupurt commented 10 months ago

Gotcha. I'll try and poke around to see if I can find anything else interesting regarding the features.