abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.09k stars 843 forks source link

GPU Utilization zero when evaluating GGUFs through lm-evaluation #1510

Open MuhammadBinUsman03 opened 3 weeks ago

MuhammadBinUsman03 commented 3 weeks ago

i'm enabling gpu-acceleration during installation as suggested in readme: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]' Then activating the local server by: python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers -1 and then finally using lm-evaluation for evaluating:

lm_eval --model gguf --model_args base_url=http://localhost:8000 --tasks multimedqa

....this just runs evaluation without using any GPU

MuhammadBinUsman03 commented 3 weeks ago

On 1xa100 - runpod

chunfenri commented 3 weeks ago

the same, just use CPU running, but occupied about 500MB GPU-memory