Closed dvega-flexion closed 11 months ago
I've tried getting mentat
running with ollama
and it seems like there are a couple of pieces missing besides just setting OPENAI_API_BASE
.
I also had to set OPENAI_API_KEY=anything
.
However it fails to respond to any requests as it can't use any embeddings. The list of embedding models appears hard coded and I can't figure out how to set something like from langchain.embeddings import OllamaEmbeddings
If I try to use an OpenAI embedding model like gpt-3.5-turbo
while having the OPENAI_API_BASE
it fails when using ollama
and litellm
because litellm
doesn't have that embedding model and I can't use one of their supported embedding models because it's not in the hardcoded list of mentat
.
Thinking about this some more. I think it can be solved by adding an extra cli option. When you run mentat
with an unknown --model ...
it asks you to the the --maximum-context ...
. I will try and figure out how I need to add an option like --maximum-embedding-context ...
use that instead of raising an error
@rupurt which embedding model are you trying to use?
The auto context feature (embeddings / RAG) is new and we haven't tried embedding models other than text-embedding-ada-002
yet, but we are prioritizing adding support for other models.
You're only getting this error when running with auto-context, right? (i.e. mentat -a
?)
@biobootloader oh, interesting. I was trying to route to the OpenAI API for the embeddings with --embedding-model openai/text-embedding-ada-002
. I'm more than happy to help test and add (basic) local model features. That is my current quest to try and bring down OpenAI costs and open up custom models to our current tooling.
I'm also familiar with https://github.com/mudler/LocalAI and looking at the mentat
codebase I think that will be a better current fit because you can just override exact model names. e.g. I can map gpt-3.5-turbo
-> mistral
in LocalAI
I made a PR https://github.com/AbanteAI/mentat/pull/397 to not use an embeddings model if auto_context is false. There was no reason to use it. I think that should help with the immediate problem though it still leaves open how to handle using different endpoints for different models.
~Thanks for mentioning that you needed to set an API key even when using a free/local endpoint. I'll make a PR to fix that too.~ This turns out to be more difficult than I thought because the OpenAi client throws an exception if you don't give it a key.
I've made some good progress using LocalAI
. I've been using the bert
text embeddings models which only supports a context size of 500
which seems like the limiting factor now. I've set --auto-tokens 250
(and lower) but it causes a segfault in the LocalAI backend. It seems like mentant
is sending too much context.
I'm going to try and get the Jina embeddings model working in LocalAI
which supports an 8k
context to see if that works.
--auto-tokens 250
doesn't change the size of the features, that's changing how many total tokens will be sent to the model as context. So the embedding model is seeing larger features and that's probably causing the crash
Gotcha. I'll try and poke around to see if I can find anything else interesting regarding the features.
They already are supported. If you specify the OPENAI_API_BASE it should work. See here.