Closed NimbleAINinja closed 1 day ago
The way to access the local models is through the OpenAI library because they all support it.
The program specifically uses Ollama for a reason:
Ollama has its own API and interface designed for local models Ollama handles model management, context windows, and local deployment efficiently It's purpose-built for running local models
The OpenAI library is NOT meant for local models:
It's designed specifically to interact with OpenAI's API services It's for hosted models that cost money per request It has completely different endpoints, authentication, and usage patterns
While some projects do try to emulate OpenAI's API format locally (like LocalAI), that's:
Not what Ollama is designed for Would add unnecessary complexity Would limit Ollama's native capabilities Would add an extra layer that could break
@TheBlewish I'm using LM Studio on Mac which exposes an OpenAI-compatible endpoint and can run MLX models (which are faster, especially in first-time-to-token).
@NimbleAINinja 's works nicely with LM Studio by just overriding the URL and specifying the name of one of my local models.
Is your issue only with the additional openai
and anthropic
dependencies in requirements.txt
?
Imma just gonna put this here: https://ollama.com/blog/openai-compatibility
Even Ollama has OpenAI API endpoints. Unless you specifically need some functionality that is in ollama's own API that's not in the OpenAI API, you could simplify the solution while at the same time significantly increasing backend options.
A short list of popular local llm servers that provide OpenAI API compatibility, and would all work with this project if it only provides openai api support:
Compared to the list using ollama's own API:
OpenAI's API is in practice an industry standard, and just about every local llm hosting solution support it.
You should really let people use their OpenAI-compatible endpoints. Maybe throw in a warning about token usage if you want.
Many people don't use ollama to run their local llama (koboldAI, LMstudio, etc...) and all those software expose an open AI compatible endpoint.
There are also many very cheap hosted models to use through providers or proxy like Openrouter
Absolutely wild to me that you won't at least allow people the option.
When you're in a nanny state competition and your opponent is @TheBlewish. Christ.
Someone will make a fork which attracts most user and dev attention.
I agree with everything everyone else here has said @TheBlewish I would rather just use LMStudio for this like I do with everything else. It makes perfect sense to support any openAI compatible hosted LLM, whether on a cloud service or your own machine. If a user is savvy enough to install this and get it running, they should know that running on some cloud APIs will be quite expensive. It's not on you to make sure that someone doesn't blow their money on API credits, especially when it so significantly limits the potential of your project.
@NimbleAINinja if this goes nowhere, are you open to contributions on your fork?
@NimbleAINinja @TheBlewish I tried this fork. It worked with zero tweaks connecting to my openrouter.ai account. However, after scraping like 5 pages, it said that the document size was 127% the context window. Interesting because I was using a model with 128k token context. Not sure what happened.
You'll also notice the articles found are not relevant at all.
@synth-mania yes, I'm open to contributions. I'm also going to add support for alternate search providers like Brave, Exa and Tavily.
Updates to llm_config and llm_wrapper to allow for Anthropic and OpenAi / OpenAI-like hosted LLMs