Support `llama.cpp` directly, bypassing `ollama`

savchenko commented 1 week ago

Given the close relationship between ollama and llama.cpp, would it be possible to support llama-server?

It exposes OpenAI-compatible HTTP endpoint on localhost.

fmaclen commented 1 week ago

We recently added support for OpenAI servers, you can find the configuration in the Settings view.

Can you configure it with your llama-server and let me know if it works?

savchenko commented 1 week ago

Tested with v0.20.1, connectivity reports as working:

However, models parsing fails and model can't be selected in the "Sessions" tab.

In llama.cpp console request is successful:

request: GET /v1/models 127.0.0.1 200

Manual curl returns:

{
  "object": "list",
  "data": [
    {
      "id": "/home/user/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf",
      "object": "model",
      "created": 1731505790,
      "owned_by": "llamacpp",
      "meta": {
        "vocab_type": 2,
        "n_vocab": 152064,
        "n_ctx_train": 32768,
        "n_embd": 5120,
        "n_params": 32763876352,
        "size": 18778431488
      }
    }
  ]
}

EDIT

I have noticed that OpenAPI endpoint can't be saved without an API key, "refresh" button in UI is inactive unless the key field is non-empty.

Providing one does not make any difference though.

fmaclen commented 1 week ago

Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.

I have noticed that OpenAPI endpoint can't be saved without an API key

Yeah, since this feature was designed specifically for OpenAI it wouldn't work without an API key so that's why we made it "mandatory", but we should probably document this better.

When we connect to Ollama via the OpenAI-compatible API we just enter a random API key which gets ignored anyways.

savchenko commented 1 week ago

Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.

Not a problem.

Also, I've checked the console, but there is no output at any level apart from the benign preload warnings.

"Network" tab shows 200s to the ../models/ with the same JSON payload as I have provided above.

fmaclen commented 1 week ago

Found the cause of the problem. Our current implementation filters out any models that don't include gpt in their name. Therefore Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf gets filtered out.

Removing the filter makes it work:

This is because when we get the models from OpenAI it also sends back a list of non-LLM models that are incompatible with Hollama.

fmaclen commented 1 week ago

@savchenko here's a work-in-progress demo if you want to check it out: https://llama-cpp-llama-server-opena.hollama.pages.dev/settings

You'll need to add a "OpenAI compatible" connection type to setup your llama.cpp server.

savchenko commented 1 week ago

@fmaclen , the interface is slightly broken. Latest Firefox ESR, v128.3.1

fmaclen commented 1 week ago

@savchenko slightly broken is quite the understatement 😅 Just pushed a fix, if you refresh the page it should look correct in Firefox.

savchenko commented 1 week ago

Fresh container build from 00f5862

Firefox

Clicking on the SL links yields no UI changes, while in the dev. console:

Uncaught (in promise) TypeError: e.servers is undefined
    Immutable 10
        r
        ce
        F
        _t
        at
        jt
        le
        rt
        rn
        ln
    <anonymous> http://localhost:4173/sessions:45
    promise callback* http://localhost:4173/sessions:44
3.BdijOe1Y.js:1:3551

Chromium

The interface works in Chromium, however attempting to query llama.cpp shows the following error:

I do not observe any new messages in the llama's stdout after clicking "Run" in Hollama.

fmaclen commented 1 week ago

Thanks for the update.

I was able to replicate the issue you are seeing with Firefox and I'm pretty sure it's caused by some hacky code I wrote just to quickly try things out.

That being said, it works fine for me in Chromium. If you were using the most recent release of Hollama in the same browser (with the same URL/hostname) it's possible it might have conflicting settings stored in localStorage. This is something I still need to test/QA before releasing this new version.

Couple of questions, if you don't mind:

What command are you running to build the container?
Does the UI load correctly in Firefox from the live demo? https://llama-cpp-llama-server-opena.hollama.pages.dev/settings
In Chromium, do you still get the same error Invalid strategy in an Incognito window?

savchenko commented 1 week ago

git pull && git checkout 00f5862
docker build -t maybellama .
docker run -p 4173:4173 maybellama

Yes
Yes

fmaclen commented 1 week ago

@savchenko thanks for the clarification.

Try to build fee51b7 which should have fixed the Invalid strategy error and the layout issues in Firefox. There are still a handful of smaller bugs but you should be able to interact with llama-sever 🤞

savchenko commented 1 week ago

Success!

Shall this be closed?

fmaclen commented 1 week ago

Glad to hear it's working!

Shall this be closed?

No, the issue will be closed automatically once the feature is released. There is still a fair amount of cleanup and testing I need to do before we can push this out.

fmaclen commented 5 hours ago

:tada: This issue has been resolved in version 0.22.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

fmaclen / hollama

Support `llama.cpp` directly, bypassing `ollama` #233

Firefox

Chromium