Open svenseeberg opened 1 month ago
Ollama has limitations, for example with available models and loading them. We may want to switch to vLLM.
Almost done: https://git.verdigado.com/verdigado-Privileged/Salt/pulls/2226
However, we need to find models that fit into the graphic card memory.
Ollama has limitations, for example with available models and loading them. We may want to switch to vLLM.