christensenjairus / ClusterCreator

Terraform & Ansible K8S on Proxmox
75 stars 11 forks source link

ClusterCreator: Terraform & Ansible K8S Bootstrapping on Proxmox

ClusterCreator Overview

Table of Contents

  1. Introduction
  2. Features
  3. Prerequisites
  4. Installation
  5. Usage
  6. Examples
  7. Advanced Configurations
  8. Troubleshooting
  9. Final Product
  10. Additional Resources

Introduction

ClusterCreator automates the creation and maintenance of fully functional Kubernetes (K8S) clusters of any size on Proxmox. Leveraging Terraform/OpenTofu and Ansible, it facilitates complex setups, including decoupled etcd clusters, diverse worker node configurations, and optional integration with Unifi networks and VLANs.

Having a virtualized K8S cluster allows you to not only simulate a cloud environment but also scale and customize your cluster to your needs—adding or removing nodes and disks, managing backups and snapshots of the virtual machine disks, customizing node class types, and controlling state.

Watch a step-by-step demo on my blog.


Features


Prerequisites

Before proceeding, ensure you have the following:


Installation

1. Add Proxmox Cluster User

ClusterCreator requires access to the Proxmox cluster. Execute the following commands on your Proxmox server to create a datacenter user:

1. Add a Proxmox User:

pveum user add terraform@pve -comment "Terraform User"

2. Add a Custom Role for Terraform with Required Permissions:

pveum role add TerraformRole -privs "Datastore.Allocate Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Pool.Allocate Pool.Audit Sys.Audit Sys.Console Sys.Modify SDN.Use VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt User.Modify Mapping.Use"

3. Assign the Role to the User at the Datacenter Level:

pveum aclmod / -user terraform@pve -role TerraformRole

4. Create an API Token for the User:

sudo pveum user token add terraform@pve provider --privsep=0

For additional documenation see Proxmox API Token Authentication.

2. Configure Secrets Files

For Tofu (Terraform)

Rename and edit secrets.tf.example to secrets.tf. These secrets are used by Tofu to interact with Proxmox and Unifi.

cp secrets.tf.example secrets.tf

Environment Variables For Bash

Rename and edit .env.example to .env. These secrets are used in bash scripts for VM operations.

cp .env.example .env

Note: There may be overlapping configurations between secrets.tf and .env.

3. Edit Configuration Files

Customize the following configuration files to suit your environment:


Usage

1. Create a VM Template

Run the create_template.sh script to generate a cloud-init ready VM template for Tofu.

./create_template.sh

What It Does:

Outcome: A VM template that installs all required packages and configurations, ready for cloud-init.

2. Initialize Tofu

Initialize Tofu modules. This step is required only once.

tofu init

3. Create Tofu Workspace

Create a dedicated workspace for your cluster.

tofu workspace new <cluster_name>

Purpose: Ensures Tofu commands are scoped to the specified cluster. Switch between workspaces using:

tofu workspace switch <cluster_name>

4. Create VMs with Tofu

Apply the Tofu configuration to create VMs and related resources.

tofu apply [--auto-approve] [-var="template_vm_id=<vm_id>"]

Functionality:

Default template_vm_id: 9000

5. Install Kubernetes with Ansible

Run the Ansible playbooks to set up Kubernetes.

./install_k8s.sh --cluster_name <CLUSTER_NAME> [-a/--add-nodes]

Options:

Includes:

Note: Avoid using --add-nodes for setting up or editing a decoupled etcd cluster.

6. Manage Kubernetes Clusters

Kubeconfig Files

Configure your kubeconfig to interact with the clusters:

export KUBECONFIG=~/.kube/config:~/.kube/alpha.yml:~/.kube/beta.yml:~/.kube/gamma.yml

Tip: Add the export command to your shell's configuration file (~/.bashrc or ~/.zshrc) for persistence.

Use tools like kubectx or kubie to switch between contexts.

Drain or Remove a Node

Remove a node from the cluster:

./remove_node.sh -n/--cluster-name <CLUSTER_NAME> -h/--hostname <NODE_HOSTNAME> -t/--timeout <TIMEOUT_SECONDS> [-d/--delete]

Options:

Note: Not applicable for decoupled etcd nodes.

Uninstall Kubernetes

Reset the Kubernetes cluster:

./uninstall_k8s.sh -n/--cluster_name <CLUSTER_NAME> [-h/--single-hostname <HOSTNAME_TO_RESET>]

Options:

Destroy VMs with Tofu

Remove VMs, pools, and VLANs:

tofu destroy [--auto-approve] [--target='proxmox_virtual_environment_vm.node["<vm_name>"]']

Options:

Power Control

Manage VM power states:

./powerctl_pool.sh [--start|--shutdown|--pause|--resume|--hibernate|--stop] <POOL_NAME> [--timeout <timeout_in_seconds>]

Requirements: QEMU Guest Agent must be running on VMs.

Run Commands on Host Groups

Execute bash commands on specified Ansible host groups:

./run_command_on_host_group.sh [-n/--cluster-name <CLUSTER_NAME>] [-g/--group <GROUP_NAME>] [-c/--command '<command>']

Example:

./run_command_on_host_group.sh -n mycluster -g all -c 'sudo apt update'

Examples

Alpha Cluster: Single Node

A minimal cluster resembling Minikube or Kind.

Note: Less than one worker node results in the control plane being untainted, allowing it to run workloads.

Beta Cluster: Multiple General Workers

Expand with additional worker nodes for diverse workloads.

Note: etcd nodes are utilized by control plane nodes but are not explicitly shown.

Gamma Cluster: Highly Available Control Plane with Decoupled etcd

A robust setup with multiple control and etcd nodes, including GPU workers.


Advanced Configurations

Dynamic Configurations

Leverage OpenTofu and Ansible to create highly dynamic cluster configurations:

Dual Stack Networking

Configure IPv4 and IPv6 support:

  1. IPv6 Disabled:
  1. IPv6 Enabled, Single Stack:
  1. IPv6 Enabled, Dual Stack:

Note: IPv6-only clusters are not supported due to complexity and external dependencies (e.g., GitHub Container Registry lacks IPv6).

Tip: The HA kube-vip API server can utilize an IPv6 address without enabling dual-stack.

Custom Worker Types

Define custom worker classes in clusters.tf to meet specific workload requirements:


Troubleshooting

Installation Errors

Common Issues:

Workaround: For persistent issues, create brand-new VMs to ensure a clean environment.


Final Product

Proxmox Pools with VMs Managed by Tofu

image

Unifi Network with VLAN Managed by Tofu

image

Gamma Cluster Example in K9s

image

Additional Resources