Closed savchenko closed 5 hours ago
We recently added support for OpenAI servers, you can find the configuration in the Settings view.
Can you configure it with your llama-server and let me know if it works?
Tested with v0.20.1, connectivity reports as working:
However, models parsing fails and model can't be selected in the "Sessions" tab.
In llama.cpp
console request is successful:
request: GET /v1/models 127.0.0.1 200
Manual curl
returns:
{
"object": "list",
"data": [
{
"id": "/home/user/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf",
"object": "model",
"created": 1731505790,
"owned_by": "llamacpp",
"meta": {
"vocab_type": 2,
"n_vocab": 152064,
"n_ctx_train": 32768,
"n_embd": 5120,
"n_params": 32763876352,
"size": 18778431488
}
}
]
}
EDIT
I have noticed that OpenAPI endpoint can't be saved without an API key, "refresh" button in UI is inactive unless the key field is non-empty.
Providing one does not make any difference though.
Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.
I have noticed that OpenAPI endpoint can't be saved without an API key
Yeah, since this feature was designed specifically for OpenAI it wouldn't work without an API key so that's why we made it "mandatory", but we should probably document this better.
When we connect to Ollama via the OpenAI-compatible API we just enter a random API key which gets ignored anyways.
Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.
Not a problem.
Also, I've checked the console, but there is no output at any level apart from the benign preload warnings.
"Network" tab shows 200s to the ../models/
with the same JSON payload as I have provided above.
Found the cause of the problem. Our current implementation filters out any models that don't include gpt
in their name.
Therefore Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf
gets filtered out.
Removing the filter makes it work:
This is because when we get the models from OpenAI it also sends back a list of non-LLM models that are incompatible with Hollama.
@savchenko here's a work-in-progress demo if you want to check it out: https://llama-cpp-llama-server-opena.hollama.pages.dev/settings
You'll need to add a "OpenAI compatible" connection type to setup your llama.cpp server.
@fmaclen , the interface is slightly broken. Latest Firefox ESR, v128.3.1
@savchenko slightly broken is quite the understatement 😅 Just pushed a fix, if you refresh the page it should look correct in Firefox.
Fresh container build from 00f5862
Clicking on the SL links yields no UI changes, while in the dev. console:
Uncaught (in promise) TypeError: e.servers is undefined
Immutable 10
r
ce
F
_t
at
jt
le
rt
rn
ln
<anonymous> http://localhost:4173/sessions:45
promise callback* http://localhost:4173/sessions:44
3.BdijOe1Y.js:1:3551
The interface works in Chromium, however attempting to query llama.cpp
shows the following error:
I do not observe any new messages in the llama's stdout after clicking "Run" in Hollama.
Thanks for the update.
I was able to replicate the issue you are seeing with Firefox and I'm pretty sure it's caused by some hacky code I wrote just to quickly try things out.
That being said, it works fine for me in Chromium. If you were using the most recent release of Hollama in the same browser (with the same URL/hostname) it's possible it might have conflicting settings stored in localStorage
. This is something I still need to test/QA before releasing this new version.
Couple of questions, if you don't mind:
Invalid strategy
in an Incognito window?git pull && git checkout 00f5862
docker build -t maybellama .
docker run -p 4173:4173 maybellama
@savchenko thanks for the clarification.
Try to build fee51b7
which should have fixed the Invalid strategy
error and the layout issues in Firefox.
There are still a handful of smaller bugs but you should be able to interact with llama-sever
🤞
Success!
Shall this be closed?
Glad to hear it's working!
Shall this be closed?
No, the issue will be closed automatically once the feature is released. There is still a fair amount of cleanup and testing I need to do before we can push this out.
:tada: This issue has been resolved in version 0.22.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Given the close relationship between ollama and llama.cpp, would it be possible to support llama-server?
It exposes OpenAI-compatible HTTP endpoint on localhost.