ClusterCreator automates the creation and maintenance of fully functional Kubernetes (K8S) clusters of any size on Proxmox. Leveraging Terraform/OpenTofu and Ansible, it facilitates complex setups, including decoupled etcd clusters, diverse worker node configurations, and optional integration with Unifi networks and VLANs.
Having a virtualized K8S cluster allows you to not only simulate a cloud environment but also scale and customize your cluster to your needs—adding or removing nodes and disks, managing backups and snapshots of the virtual machine disks, customizing node class types, and controlling state.
Watch a step-by-step demo on my blog.
Before proceeding, ensure you have the following:
ClusterCreator requires access to the Proxmox cluster. Execute the following commands on your Proxmox server to create a datacenter user:
pveum user add terraform@pve -comment "Terraform User"
pveum role add TerraformRole -privs "Datastore.Allocate Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Pool.Allocate Pool.Audit Sys.Audit Sys.Console Sys.Modify SDN.Use VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt User.Modify Mapping.Use"
pveum aclmod / -user terraform@pve -role TerraformRole
sudo pveum user token add terraform@pve provider --privsep=0
For additional documenation see Proxmox API Token Authentication.
Rename and edit secrets.tf.example to secrets.tf. These secrets are used by Tofu to interact with Proxmox and Unifi.
cp secrets.tf.example secrets.tf
Rename and edit .env.example
to .env
. These secrets are used in bash scripts for VM operations.
cp .env.example .env
Note: There may be overlapping configurations between secrets.tf and .env.
Customize the following configuration files to suit your environment:
k8s.env
: Specify Kubernetes versions and template VM configurations.vars.tf
: Define non-sensitive variables for Tofu.clusters.tf
: Configure cluster specifications. Update the username to your own.main.tf
: Manage VM, VLAN, and pool resources with Tofu.Run the create_template.sh
script to generate a cloud-init ready VM template for Tofu.
./create_template.sh
What It Does:
Outcome: A VM template that installs all required packages and configurations, ready for cloud-init.
Initialize Tofu modules. This step is required only once.
tofu init
Create a dedicated workspace for your cluster.
tofu workspace new <cluster_name>
Purpose: Ensures Tofu commands are scoped to the specified cluster. Switch between workspaces using:
tofu workspace switch <cluster_name>
Apply the Tofu configuration to create VMs and related resources.
tofu apply [--auto-approve] [-var="template_vm_id=<vm_id>"]
Functionality:
cluster_config.json
for Ansible.Default template_vm_id
: 9000
Run the Ansible playbooks to set up Kubernetes.
./install_k8s.sh --cluster_name <CLUSTER_NAME> [-a/--add-nodes]
Options:
--add-nodes
: Adds new nodes to an existing cluster.Includes:
Note: Avoid using --add-nodes
for setting up or editing a decoupled etcd cluster.
Configure your kubeconfig
to interact with the clusters:
export KUBECONFIG=~/.kube/config:~/.kube/alpha.yml:~/.kube/beta.yml:~/.kube/gamma.yml
Tip: Add the export command to your shell's configuration file (~/.bashrc
or ~/.zshrc
) for persistence.
Use tools like kubectx
or kubie
to switch between contexts.
Remove a node from the cluster:
./remove_node.sh -n/--cluster-name <CLUSTER_NAME> -h/--hostname <NODE_HOSTNAME> -t/--timeout <TIMEOUT_SECONDS> [-d/--delete]
Options:
--delete
: Deletes and resets the node for fresh re-commissioning.Note: Not applicable for decoupled etcd nodes.
Reset the Kubernetes cluster:
./uninstall_k8s.sh -n/--cluster_name <CLUSTER_NAME> [-h/--single-hostname <HOSTNAME_TO_RESET>]
Options:
--single-hostname
: Resets a specific node. Without this, all nodes are reset, and the cluster is deleted.Remove VMs, pools, and VLANs:
tofu destroy [--auto-approve] [--target='proxmox_virtual_environment_vm.node["<vm_name>"]']
Options:
--target
: Specifies particular VMs to destroy.Manage VM power states:
./powerctl_pool.sh [--start|--shutdown|--pause|--resume|--hibernate|--stop] <POOL_NAME> [--timeout <timeout_in_seconds>]
Requirements: QEMU Guest Agent must be running on VMs.
Execute bash commands on specified Ansible host groups:
./run_command_on_host_group.sh [-n/--cluster-name <CLUSTER_NAME>] [-g/--group <GROUP_NAME>] [-c/--command '<command>']
Example:
./run_command_on_host_group.sh -n mycluster -g all -c 'sudo apt update'
A minimal cluster resembling Minikube or Kind.
alpha
Note: Less than one worker node results in the control plane being untainted, allowing it to run workloads.
Expand with additional worker nodes for diverse workloads.
beta
general
)Note: etcd nodes are utilized by control plane nodes but are not explicitly shown.
A robust setup with multiple control and etcd nodes, including GPU workers.
gamma
Leverage OpenTofu and Ansible to create highly dynamic cluster configurations:
Configure IPv4 and IPv6 support:
ipv6.enabled = false
ipv6.enabled = true
ipv6.dual_stack = false
ipv6.enabled = true
ipv6.dual_stack = true
Note: IPv6-only clusters are not supported due to complexity and external dependencies (e.g., GitHub Container Registry lacks IPv6).
Tip: The HA kube-vip API server can utilize an IPv6 address without enabling dual-stack.
Define custom worker classes in clusters.tf
to meet specific workload requirements:
GPU Workers:
clusters.tf
Storage Workers:
Database Workers:
FedRAMP Workers:
Backup Workers:
Common Issues:
Proxmox Clone Failures: Proxmox may struggle with cloning identical templates repeatedly. \ Solution:
tofu apply
multiple times with larger cluster sizes.Configuration Conflicts: Errors related to existing configurations or unresponsive VMs. \ Solution:
./uninstall_k8s.sh
to reset VMs if necessary.Workaround: For persistent issues, create brand-new VMs to ensure a clean environment.