Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
26.78k stars 2.68k forks source link

[FEAT]: LM Studio Multi-model inference server support (Workspace/Agent) #1569

Open wallartup opened 5 months ago

wallartup commented 5 months ago

How are you running AnythingLLM?

Docker (local)

What happened?

Going into workspace settings and can choose every LLM provider except LM studio. I have added 2 models to LM studio playground to be able to load them directly into memory and keep them there, however I can't seem to swap between them.

The reason I am using LM studio instead of Ollama is that LM studio is much faster than Ollama as Ollama offloads the model from memory and have to reload the model to memory.

Are there known steps to reproduce?

No response

shatfield4 commented 5 months ago

This is actually done intentionally, LMStudio does not support switching between multiple models and being run at the same time so that is why we block LMStudio from being configured at the workspace level.

wallartup commented 5 months ago

This is actually done intentionally, LMStudio does not support switching between multiple models and being run at the same time so that is why we block LMStudio from being configured at the workspace level.

I have loaded 2 models at the same time with playground and using other apps to inference and it seems like it's working. Could it be worth testing now that they have updated their solution?

timothycarambat commented 5 months ago

@wallartup this is using the multi-model chat server right? You can use this as your LLM, but it will not still appear in the UI for a workspace specific LLM. The multi-model endpoint support from LMStudio is somewhat new and many people still do not use that as the "main" inference server and it winds up creating more issues for us helping people understand LMStudio has two inference servers.

We can add LMStudio support for workspace models and as long as you are using the multi-model, we can populate the dropdown

wallartup commented 5 months ago

@wallartup this is using the multi-model chat server right? You can use this as your LLM, but it will not still appear in the UI for a workspace specific LLM. The multi-model endpoint support from LMStudio is somewhat new and many people still do not use that as the "main" inference server and it winds up creating more issues for us helping people understand LMStudio has two inference servers.

We can add LMStudio support for workspace models and as long as you are using the multi-model, we can populate the dropdown

@timothycarambat That's correct. If you use playground (multi-model chat server) it works, the pro over Ollama is that it really loads into the memory and does not "ever" unload. The inferencing is therefore lightning fast in comparison to Ollama which at times is actually quite slow (even with Mlock), this is why I want it added in the workspace menu. I might be able to figure it out myself but I think giving people the option and calling it (LM studio PLAYGROUND ONLY) you are also mitigating that risk.

mrdjohnson commented 5 months ago

If you use playground (multi-model chat server) it works

Super fun fact on that: Loading Models with lmstudio sdk; if Anything LLM used the lmstudio sdk it could allow users to load and use models without needing the playground at all!

I think giving people the option and calling it (LM studio PLAYGROUND ONLY) @wallartup 1: based on the information included above maybe we should not call it that? 2; I have been working with the LM Studio team lately on getting loading through the API as well. My current suggestion is that the server options include an "auto-load" option, this will allow users to see ALL their models, and whichever they pick get loaded and used; Some variation of this will be coming soon so calling it PLAYGROUND ONLY mode might confuse users (I can also just create another github issue when that time comes)