Option to enable the GPU

Hi All,

First of all thank you for this excellent tool which makes it very to easy to run the LLM models without any hassle.

I am aware that the main purpose of the localllm is to eliminate the dependency on GPUs and run the models using CPU. However I wanted to know if there is an option to offload the layers to the GPU.

Machine : Compute engine created in GCP
OS : Ubuntu 22.04 LTS GPU : Tesla T4

Steps I followed thus far is as given below:

Installed the Nvidia driver in the compute engine. nvidia-smi output as given below
Assuming localllm does not directly provide an option to enable GPU ( I may be wrong here), I clone the llama-cpp-python repository, and updated the n_gpu_layers to 4 in llama_cpp/server/settings.py.
Built the package by running pip install -e ., complete step is given here
Killed the localllm and started again.

However I still see that the GPUs are not being utilized.

Are the above steps correct or did I miss anything here?

Thank you,
KK

GoogleCloudPlatform / localllm

Option to enable the GPU #22