GoogleCloudPlatform / localllm

Apache License 2.0
1.49k stars 113 forks source link

Option to enable the GPU #22

Open kk2491 opened 4 months ago

kk2491 commented 4 months ago

Hi All,

First of all thank you for this excellent tool which makes it very to easy to run the LLM models without any hassle.

I am aware that the main purpose of the localllm is to eliminate the dependency on GPUs and run the models using CPU. However I wanted to know if there is an option to offload the layers to the GPU.

Machine : Compute engine created in GCP
OS : Ubuntu 22.04 LTS GPU : Tesla T4

Steps I followed thus far is as given below:

  1. Installed the Nvidia driver in the compute engine. nvidia-smi output as given below
    image
  2. Assuming localllm does not directly provide an option to enable GPU ( I may be wrong here), I clone the llama-cpp-python repository, and updated the n_gpu_layers to 4 in llama_cpp/server/settings.py.
  3. Built the package by running pip install -e ., complete step is given here
  4. Killed the localllm and started again.

However I still see that the GPUs are not being utilized.

Are the above steps correct or did I miss anything here?

Thank you,
KK

bobcatfish commented 4 months ago

Thanks for the question @kk2491, I'll take a look and see what I can find!