-
### What model would you like?
JetMoE is a Mixture of Experts model that reaches Llama2 performance while having only 2.2B parameters active. I think this has a lot of potential for low end devices…
-
- [ ] xAI's grok
- [ ] AWS
-
### Describe the bug
After setting up Agents and workflow with local endpoints. Getting this error message ```openai.OpenAIError: The api_key client option must be set either by passing api_key to …
-
### What is the issue?
After ollama's upgrade to 0.27 from 0.20, it runs gemma 2 9b at very low speed. I don't think the OS is out of vram, since gemma 2 only costs 6.8G (q_4_0) vram while my lapto…
-
### Validations
- [X] I believe this is a way to improve. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [ ] I'm not able to find an [open issue](https://githu…
-
As the title says, using starcoder2 times out or appears stuck.
Logs attached.
[starcoder2_timout.txt](https://github.com/user-attachments/files/15588642/starcoder2_timout.txt)
-
### What is the issue?
~$ nvidia-smi
Fri May 24 09:41:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 …
-
If one wants to host one or more model on a beefy computer and give access to a selected few but not the entire world, I would like to suggest some sort of user feature.
**Host**
The host device…
-
I can see that Quarkus extension only covers the `/api/generate` Ollama endpoint but do we plan to cover everything like here https://github.com/langchain4j/langchain4j/blob/main/langchain4j-ollama/sr…
-
Hi,
I have installed this tool in Ubuntu where my Ollama is also running with llama3 model on default port at local host. This program runs in background but how do you use it then? I am remotely ac…