Run LLMs locally on Cloud Workstations. Uses:
In this guide:
This repository includes a Dockerfile that can be used to create a custom base image
for a Cloud Workstation environment that includes the llm
tool.
To get started, you'll need to have a GCP Project and have the gcloud
CLI installed.
Set environment variables
Set the PROJECT_ID
and PROJECT_NUM
environment variables from your GCP project. You must modify the values.
export PROJECT_ID=<project-id>
export PROJECT_NUM=<project-num>
Set other needed environment variables. You can modify the values.
export REGION=us-central1
export LOCALLLM_REGISTRY=localllm-registry
export LOCALLLM_IMAGE_NAME=localllm
export LOCALLLM_CLUSTER=localllm-cluster
export LOCALLLM_WORKSTATION=localllm-workstation
export LOCALLLM_PORT=8000
Set the default project.
gcloud config set project $PROJECT_ID
Enable needed services.
gcloud services enable \
cloudbuild.googleapis.com \
workstations.googleapis.com \
container.googleapis.com \
containeranalysis.googleapis.com \
containerscanning.googleapis.com \
artifactregistry.googleapis.com
Create an Artifact Registry repository for docker images.
gcloud artifacts repositories create $LOCALLLM_REGISTRY \
--location=$REGION \
--repository-format=docker
Build and push the image to Artifact Registry using Cloud Build. Details are in cloudbuild.yaml.
gcloud builds submit . \
--substitutions=_IMAGE_REGISTRY=$LOCALLLM_REGISTRY,_IMAGE_NAME=$LOCALLLM_IMAGE_NAME
Configure a Cloud Workstation cluster.
Wait for this to complete before moving forward which can take up to 20 minutes.
gcloud workstations clusters create $LOCALLLM_CLUSTER \
--region=$REGION
Create a Cloud Workstation configuration. We suggest using a machine type of e2-standard-32 which has 32 vCPU, 16 core and 128 GB memory.
gcloud beta workstations configs create $LOCALLLM_WORKSTATION \
--region=$REGION \
--cluster=$LOCALLLM_CLUSTER \
--machine-type=e2-standard-32 \
--container-custom-image=us-central1-docker.pkg.dev/${PROJECT_ID}/${LOCALLLM_REGISTRY}/${LOCALLLM_IMAGE_NAME}:latest
Create a Cloud Workstation.
gcloud workstations create $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
Grant access to the default Cloud Workstation service account.
gcloud artifacts repositories add-iam-policy-binding $LOCALLLM_REGISTRY \
--location=$REGION \
--member=serviceAccount:service-$PROJECT_NUM@gcp-sa-workstationsvm.iam.gserviceaccount.com \
--role=roles/artifactregistry.reader
Start the workstation.
gcloud workstations start $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
Connect to the workstation using ssh. Alternatively, you can connect to the workstation interactively in the browser.
gcloud workstations ssh $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
Start serving the default model from the repo.
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF $LOCALLLM_PORT
Get the hostname of the workstation using:
gcloud workstations describe $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
Interact with the model by visiting the live OpenAPI documentation page: https://$LOCALLLM_PORT-$LLM_HOSTNAME/docs
.
[!NOTE] The command is now local-llm
, however the original command (llm
) is supported
inside of the cloud workstations image.
Assumes that models are downloaded to ~/.cache/huggingface/hub/
. This is the default cache
path used by Hugging Face Hub library and only supports .gguf
files.
If you're using models from TheBloke and you don't specify a filename, we'll attempt to use the model with 4 bit medium quantization, or you can specify a filename explicitly.
List downloaded models.
local-llm list
List running models.
local-llm ps
Start serving models.
Start serving the default model from the repo. Download if not present.
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
Start serving a specific model. Download if not present.
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf 8000
Stop serving models.
Stop serving all models from the repo.
local-llm kill TheBloke/Llama-2-13B-Ensemble-v5-GGUF
Stop serving a specific model.
local-llm kill TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
Download models.
local-llm pull TheBloke/Llama-2-13B-Ensemble-v5-GGUF
local-llm pull TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
Remove models.
Remove all models downloaded from the repo.
local-llm rm TheBloke/Llama-2-13B-Ensemble-v5-GGUF
Remove a specific model from the repo.
local-llm rm TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
Install the tools.
# Install the tools
pip3 install openai
pip3 install ./local-llm/.
Download and run a model.
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
Try out a query. The default query is for a haiku about cats.
python3 querylocal.py
Interact with the Open API interface via the /docs
extension. For the above, visit http://localhost:8000/docs.
To assist with debugging, you can configure model startup to write logs to a log file by providing a yaml python logging configuration file:
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000 --log-config <some config file>
To run locally using the bundled log config (log_config.yaml):
sudo touch /var/log/local-llm.log
sudo chown user:user /var/log/local-llm.log # use your user and group
# provide the log config manually
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000 --log-config local-llm/log_config.yaml
# or use an environment variable so you don't have to pass the argument
export LOG_CONFIG=$(pip show local-llm | grep Location | awk '{print $2}')/log_config.yaml
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
You can follow the logs with:
tail -f /var/log/local-llm.log
If you are running multiple models, the logs from each will be written to the same file and interleaved.
If running from Cloud Workstations, logs from running models will be written to /var/log/local-llm.log
(log_config.yaml is provided by default via the environment variable
LOG_CONFIG
within the image).
This project imports freely available LLMs and makes them available from Cloud Workstations. We recommend independently verifying any content generated by the models. We do not assume any responsibility or liability for the use or interpretation of generated content.