i'm enabling gpu-acceleration during installation as suggested in readme:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
Then activating the local server by:
python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers -1
and then finally using lm-evaluation for evaluating:
i'm enabling gpu-acceleration during installation as suggested in readme:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
Then activating the local server by:python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers -1
and then finally using lm-evaluation for evaluating:lm_eval --model gguf --model_args base_url=http://localhost:8000 --tasks multimedqa
....this just runs evaluation without using any GPU