Open sij1nk opened 3 months ago
Hi. Thanks for contributing. For now llamacpp models in openllm can only be deployed to MacOS. There are some hard coded platform specified parameters. I suggest to use vllm on Linux and even WSL2, since it is faster and more mature for production usages.
We are adding a new feature for easier tweak on parameters. Maybe you can help us to find out a pair of configuration to run vllm models on a 4G GPU after released.
Thank you! I will look into vllm some time later.
Describe the bug
Hi!
I tried to run an llm locally using
openllm
, andphi3:3.8b-ggml-q4
happens to be the only model which I am able to run locally according to openllm, so I ranopenllm run phi3:3.8b-ggml-q4
, which failed (logs are attached).The failure happens during a cmake build process, as it is unable to find
FOUNDATION_LIBRARY
(line 116 in the logs). I looked into what this library provides and ended up in ggerganov/llama.cppHowever I assume
GGML_METAL
should not be set, as I'm running on a Linux (well, Windows10 + WSL2) system and an nvidia gpu. My attempts at disabling the Metal build were not successful, command failed with the same error as before.On the same machine, but on Windows,
openllm run phi3:3.8b-ggml-q4
fails on the same cmake build with the same error. Curiously,openllm hello
does not recognize this model as locally runnable as it did on WSL, but I did not investigate whyPlease let me know if I should raise this issue in ggerganov/llama.cpp instead
Thanks!
To reproduce
openllm run phi3:3.8b-ggml-q4
Logs
gist
Environment
bentoml env:
transformers-cli env:
openllm -v:
System information (Optional)
memory: 32GB platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 architecture: x86-64 cpu: intel core i7-11850H @ 2.50GHz gpu: NVIDIA RTX A2000 Laptop GPU 4GB