issues
search
containers
/
ramalama
The goal of RamaLama is to make working with AI boring.
MIT License
284
stars
49
forks
source link
Enable containers on macOS to use the GPU
#397
Closed
slp
closed
4 weeks ago
slp
commented
4 weeks ago
Three changes:
Bump llama.cpp to latest upstream, which enables the kompute backend to offload Q4_K_M models.
Add a
--gpu
flag to request the model to be offloaded to the GPU.
When running in a container, bind the server to
0.0.0.0
so the port can be accessed from outside the container.
slp
commented
4 weeks ago
This one supersedes #235
rhatdan
commented
4 weeks ago
LGTM
Three changes:
--gpu
flag to request the model to be offloaded to the GPU.0.0.0.0
so the port can be accessed from outside the container.