Enable containers on macOS to use the GPU

containers / ramalama

The goal of RamaLama is to make working with AI boring.

MIT License

284 stars 49 forks source link

Closed slp closed 4 weeks ago

slp commented 4 weeks ago

Three changes:

Bump llama.cpp to latest upstream, which enables the kompute backend to offload Q4_K_M models.
Add a --gpu flag to request the model to be offloaded to the GPU.
When running in a container, bind the server to 0.0.0.0 so the port can be accessed from outside the container.

slp commented 4 weeks ago

This one supersedes #235

rhatdan commented 4 weeks ago

LGTM