First of all thank you for this excellent tool which makes it very to easy to run the LLM models without any hassle.
I am aware that the main purpose of the localllm is to eliminate the dependency on GPUs and run the models using CPU. However I wanted to know if there is an option to offload the layers to the GPU.
Machine : Compute engine created in GCP
OS : Ubuntu 22.04 LTS
GPU : Tesla T4
Steps I followed thus far is as given below:
Installed the Nvidia driver in the compute engine. nvidia-smi output as given below
Assuming localllm does not directly provide an option to enable GPU ( I may be wrong here), I clone the llama-cpp-python repository, and updated the n_gpu_layers to 4 in llama_cpp/server/settings.py.
Built the package by running pip install -e ., complete step is given here
Killed the localllm and started again.
However I still see that the GPUs are not being utilized.
Are the above steps correct or did I miss anything here?
Hi All,
First of all thank you for this excellent tool which makes it very to easy to run the LLM models without any hassle.
I am aware that the main purpose of the localllm is to eliminate the dependency on GPUs and run the models using CPU. However I wanted to know if there is an option to offload the layers to the GPU.
Machine : Compute engine created in GCP
OS : Ubuntu 22.04 LTS GPU : Tesla T4
Steps I followed thus far is as given below:
nvidia-smi
output as given belown_gpu_layers
to 4 inllama_cpp/server/settings.py
.pip install -e .
, complete step is given hereHowever I still see that the GPUs are not being utilized.
Are the above steps correct or did I miss anything here?
Thank you,
KK