Experimental This project is experimental and a work in progress. Use at your own risk and do not expect thorough support!
This project deploys EKS-A Anywhere on Baremetal on Equinix Metal using the minimum requirements.
See https://aws.amazon.com/blogs/containers/getting-started-with-eks-anywhere-on-bare-metal/ for more information about EKS-A on Bare Metal.
A guided step-by-step manual installation workshop is available at
https://equinix-labs.github.io/eks-anywhere-on-equinix-metal-workshop/ If you want to learn more about how EKS-A is installed on Metal to better understand how and where you can adapt changes for your environments, we recommend following the manual workshop.
In the examples/lab directory, you can find a Terraform module to faciliate EKS-A on Bare Metal Lab environments.
EKS-A requires UEFI booting, which is supported by the following Equinix Metal On Demand plans:
With your Equinix Metal account, project, and a User API token, you can use Terraform v1+ to install a proof-of-concept demonstration environment for EKS-A on Baremetal.
Enter the examples/deploy
directory.
$ cd examples/deploy
Create a terraform.tfvars
file in the root of this project with metal_api_token
and project_id
defined. These are the required variables needed to run terraform apply
. See variables.tf
for additional settings that you may wish to customize.
# terraform.fvars
metal_api_token="...your Metal User API Token here..."
project_id="...your Metal Project API Token here..."
Note Project API Tokens can not be used to access some Gateway features used by this project. A User API Token is required.
Terraform will create an Equinix Metal VLAN, Metal Gateway, IP Reservation, and Equinix Metal servers to act as the EKS-A Admin node and worker devices. Terraform will also create the initial hardware.csv
with the details of each server and register this with the eks-anywhere
CLI to create the cluster. The worker nodes will be provisioned through Tinkerbell to act as a control-plane node and a worker-node.
Once complete, you'll see the following output:
$ terraform apply
... (~12m later)
Apply complete! Resources: 19 added, 0 changed, 0 destroyed.
Outputs:
eksa_admin_ip = "203.0.113.3"
eksa_admin_ssh_key = "/Users/username/.ssh/my-eksa-cluster-xed"
eksa_admin_ssh_user = "root"
eksa_nodes_sos = tomap({
"eksa-node-cp-001" = "b0e1426d-4d9e-4d01-bd5c-54065df61d67@sos.sv15.platformequinix.com"
"eksa-node-worker-001" = "84ffa9c7-84ce-46eb-97ff-2ae310fbb360@sos.sv15.platformequinix.com"
})
SSH into the EKS-A Admin node and follow the EKS-A on Baremetal instructions to continue within the Kubernetes environment.
ssh -i $(terraform output -json | jq -r .eksa_admin_ssh_key.value) root@$(terraform output -json | jq -r .eksa_admin_ip.value)
root@eksa-admin:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
eksa-node-cp-001 Ready control-plane,master 7m56s v1.22.10-eks-7dc61e8
eksa-node-worker-001 Ready <none> 5m30s v1.22.10-eks-7dc61e8
This section is an example of adding a new node of the exact same time as the previous nodes to the cluster. For example, if you use project defaults you'll want to add a m3.small.x86 as the new node. Also, this example is just adding a new worker node for simplicity. Adding control plane nodes is possible, but requires thinking through how many nodes are added as well as labeling them as type=cp
instead of type=worker
.
NEW_HOSTNAME="your new hostname"
POOL_ADMIN="IP address of your admin machine"
metal device create --plan m3.small.x86 --metro da --hostname $NEW_HOSTNAME
--ipxe-script-url http://$POOL_ADMIN/ipxe/ --operating-system custom_ipxe
Make note of the device's UUID, maybe use metal device get
to list them.
DEVICE_ID="UUID you noted above"
BOND0_PORT=$(metal devices get -i $DEVICE_ID -o json |
jq -r '.network_ports [] | select(.name == "bond0") | .id')
ETH0_PORT=$(metal devices get -i $DEVICE_ID -o json |
jq -r '.network_ports [] | select(.name == "eth0") | .id')
VLAN_ID="Your VLAN ID, likely 1000"
metal port convert -i $BOND0_PORT --layer2 --bonded=false --force
metal port vlan -i $ETH0_PORT -a $VLAN_ID
Put the following in a new csv file hardware2.csv
hostname,mac,ip_address,gateway,netmask,nameservers,disk,labels
<HOSTNAME>,<MAC_ADDRESS>,<IP>,<GATEWAY>,<NETMASK>,8.8.8.8|8.8.4.4,/dev/sda,type=worker
Get your machine deployment group name:
kubectl get machinedeployments -n eksa-system
Generate the kubernetes yaml from your hardware2.csv file:
eksctl anywhere generate hardware -z hardware2.csv > cluster-scale.yaml
Edit cluster-scale.yaml and remove the two bmc items.
Use the machinedeployment group name along with the csv file to scale the cluster.
kubectl apply -f cluster-scale.yaml
kubectl scale machinedeployments -n eksa-system <Your MachineDeployment Group Name> --replicas 1
This section covers the basic steps to connect your cluster to EKS with the EKS Connector. There are many more details (include pre-requisites like IAM permissions) in the EKS Connector Documentation.
Connect to the eksa-admin host.
ssh -i $(terraform output -json | jq -r .eksa_admin_ssh_key.value) root@$(terraform output -json | jq -r .eksa_admin_ip.value)
Follow the AWS documentation and set the environment variables with your authentication info for AWS. For example:
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_DEFAULT_REGION=us-west-2
Now use eksctl to register the cluster
eksctl register cluster --name my-cluster --provider my-provider --region region-code
If it succeeded, the output will show several .yaml files that were created and need to be registered with the cluster. For example, at the time of writing, applying those files would be done like so:
kubectl apply -f eks-connector.yaml,eks-connector-clusterrole.yaml,eks-connector-console-dashboard-full-access-group.yaml
Even more info can be found at the eksctl documentation.
Note This section will serve as manual instructions for installing EKS-A Bare Metal on Equinix Metal. The Terraform install above performs all of these steps for you. These instructions offer a step-by-step install with copy+paste commands that simplify the process. Refer to the open issues and please open issues if you encounter something not represented there.
Steps below align with EKS-A on Bare Metal instructions. While the steps below are intended to be complete, follow along with the EKS-A Install guide for best results.
No open issues are currently blocking. If you run into something unexpected, check the open issues and open a new issue reporting your experience.
The following tools will be needed on your local development environment where you will be running most of the commands in this guide.
Create an EKS-A Admin machine: Using the metal-cli:
Create an API Key and register it with the Metal CLI:
metal init
metal device create --plan=m3.small.x86 --metro=da --hostname eksa-admin --operating-system ubuntu_20_04
Create a VLAN:
metal vlan create --metro da --description eks-anywhere --vxlan 1000
Create a Public IP Reservation (16 addresses):
metal ip request --metro da --type public_ipv4 --quantity 16 --tags eksa
These variables will be referred to in later steps in executable snippets to refer to specific addresses within the pool. The correct IP reservation is chosen by looking for and expecting a single IP reservation to have the "eksa" tag applied.
#Capture the ID, Network, Gateway, and Netmask using jq
VLAN_ID=$(metal vlan list -o json | jq -r '.virtual_networks | .[] | select(.vxlan == 1000) | .id')
POOL_ID=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .id')
POOL_NW=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .network')
POOL_GW=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .gateway')
POOL_NM=$(metal ip list -o json | jq -r '.[] | select(.tags | contains(["eksa"]))? | .netmask')
# POOL_ADMIN will be assigned to eksa-admin within the VLAN
POOL_ADMIN=$(python3 -c 'import ipaddress; print(str(ipaddress.IPv4Address("'${POOL_GW}'")+1))')
# PUB_ADMIN is the provisioned IPv4 public address of eks-admin which we can use with ssh
PUB_ADMIN=$(metal devices list -o json | jq -r '.[] | select(.hostname=="eksa-admin") | .ip_addresses [] | select(contains({"public":true,"address_family":4})) | .address')
# PORT_ADMIN is the bond0 port of the eks-admin machine
PORT_ADMIN=$(metal devices list -o json | jq -r '.[] | select(.hostname=="eksa-admin") | .network_ports [] | select(.name == "bond0") | .id')
# POOL_VIP is the floating IPv4 public address assigned to the current lead kubernetes control plane
POOL_VIP=$(python3 -c 'import ipaddress; print(str(ipaddress.ip_network("'${POOL_NW}'/'${POOL_NM}'").broadcast_address-1))')
TINK_VIP=$(python3 -c 'import ipaddress; print(str(ipaddress.ip_network("'${POOL_NW}'/'${POOL_NM}'").broadcast_address-2))')
Create a Metal Gateway
metal gateway create --ip-reservation-id $POOL_ID --virtual-network $VLAN_ID
Create Tinkerbell worker nodes eksa-node-001
- eksa-node-002
with Custom IPXE http://{eks-a-public-address}. These nodes will be provisioned as EKS-A Control Plane OR Worker nodes.
for a in {1..2}; do
metal device create --plan m3.small.x86 --metro da --hostname eksa-node-00$a \
--ipxe-script-url http://$POOL_ADMIN/ipxe/ --operating-system custom_ipxe
done
Note that the ipxe-script-url
doesn't actually get used in this process, we're just setting it as it's a requirement for using the custom_ipxe operating system type.
Add the vlan to the eks-admin bond0 port:
metal port vlan -i $PORT_ADMIN -a $VLAN_ID
Configure the layer 2 vlan network on eks-admin with this snippet:
ssh root@$PUB_ADMIN tee -a /etc/network/interfaces << EOS
auto bond0.1000
iface bond0.1000 inet static
pre-up sleep 5
address $POOL_ADMIN
netmask $POOL_NM
vlan-raw-device bond0
EOS
Activate the layer 2 vlan network with this command:
ssh root@$PUB_ADMIN systemctl restart networking
Convert eksa-node-*
's network ports to Layer2-Unbonded and attach to the VLAN.
node_ids=$(metal devices list -o json | jq -r '.[] | select(.hostname | startswith("eksa-node")) | .id')
i=1 # We will increment "i" for the eksa-node-* nodes. "1" represents the eksa-admin node.
for id in $(echo $node_ids); do
let i++
BOND0_PORT=$(metal devices get -i $id -o json | jq -r '.network_ports [] | select(.name == "bond0") | .id')
ETH0_PORT=$(metal devices get -i $id -o json | jq -r '.network_ports [] | select(.name == "eth0") | .id')
metal port convert -i $BOND0_PORT --layer2 --bonded=false --force
metal port vlan -i $ETH0_PORT -a $VLAN_ID
done
Capture the MAC Addresses and create hardware.csv
file on eks-admin
in /root/
(run this on the host with metal cli on it):
Create the CSV Header:
echo hostname,vendor,mac,ip_address,gateway,netmask,nameservers,disk,labels > hardware.csv
Use metal
and jq
to grab HW MAC addresses and add them to the hardware.csv:
node_ids=$(metal devices list -o json | jq -r '.[] | select(.hostname | startswith("eksa-node")) | .id')
i=1 # We will increment "i" for the eksa-node-* nodes. "1" represents the eksa-admin node.
for id in $(echo $node_ids); do
# Configure only the first node as a control-panel node
if [ "$i" = 1 ]; then TYPE=cp; else TYPE=worker; fi; # change to 3 for HA
NODENAME="eks-node-00$i"
let i++
MAC=$(metal device get -i $id -o json | jq -r '.network_ports | .[] | select(.name == "eth0") | .data.mac')
IP=$(python3 -c 'import ipaddress; print(str(ipaddress.IPv4Address("'${POOL_GW}'")+'$i'))')
echo "$NODENAME,Equinix,${MAC},${IP},${POOL_GW},${POOL_NM},8.8.8.8|8.8.4.4,/dev/sda,type=${TYPE}" >> hardware.csv
done
The BMC fields are omitted because Equinix Metal does not expose the BMC of nodes. EKS Anywhere will skip BMC steps with this configuration.
Copy hardware.csv
to eksa-admin
:
scp hardware.csv root@$PUB_ADMIN:/root
We've now provided the eksa-admin
machine with all of the variables and configuration needed in preparation.
Login to eksa-admin with the LC_POOL_ADMIN
and LC_POOL_VIP
variable defined
# SSH into eksa-admin. The special args and environment setting are just tricks to plumb $POOL_ADMIN and $POOL_VIP into the eksa-admin environment.
LC_POOL_ADMIN=$POOL_ADMIN LC_POOL_VIP=$POOL_VIP LC_TINK_VIP=$TINK_VIP ssh -o SendEnv=LC_POOL_ADMIN,LC_POOL_VIP,LC_TINK_VIP root@$PUB_ADMIN
Note The remaining steps assume you have logged into
eksa-admin
with the SSH command shown above.
Install eksctl
and the eksctl-anywhere
plugin on eksa-admin.
curl "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \
--silent --location \
| tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin/
export EKSA_RELEASE="0.14.3" OS="$(uname -s | tr A-Z a-z)" RELEASE_NUMBER=30
curl "https://anywhere-assets.eks.amazonaws.com/releases/eks-a/${RELEASE_NUMBER}/artifacts/eks-a/v${EKSA_RELEASE}/${OS}/amd64/eksctl-anywhere-v${EKSA_RELEASE}-${OS}-amd64.tar.gz" \
--silent --location \
| tar xz ./eksctl-anywhere
sudo mv ./eksctl-anywhere /usr/local/bin/
Install kubectl
on eksa-admin:
snap install kubectl --channel=1.25 --classic
Version 1.25 matches the version used in the eks-anywhere repository.
Install Docker
Run the docker install script:
curl -fsSL https://get.docker.com -o get-docker.sh
chmod +x get-docker.sh
./get-docker.sh
Alternatively, follow the instructions from https://docs.docker.com/engine/install/ubuntu/.
Create EKS-A Cluster config:
export TINKERBELL_HOST_IP=$LC_TINK_VIP
export CLUSTER_NAME="${USER}-${RANDOM}"
export TINKERBELL_PROVIDER=true
eksctl anywhere generate clusterconfig $CLUSTER_NAME --provider tinkerbell > $CLUSTER_NAME.yaml
Note: The remaining steps assume you have defined the variables set above.
Install yq
snap install yq
Generate a public SSH key and store it in a variable called 'SSH_PUBLIC_KEY'
ssh-keygen -t rsa
export SSH_PUBLIC_KEY=$(cat /root/.ssh/id_rsa.pub)
Run the below yq command to make the following necessary changes to the $CLUSTER_NAME.yaml file.
yq eval -i '
(select(.kind == "Cluster") | .spec.controlPlaneConfiguration.endpoint.host) = env(LC_POOL_VIP) |
(select(.kind == "TinkerbellDatacenterConfig") | .spec.tinkerbellIP) = env(LC_TINK_VIP) |
(select(.kind == "TinkerbellMachineConfig") | (.spec.users[] | select(.name == "ec2-user")).sshAuthorizedKeys) = [env(SSH_PUBLIC_KEY)] |
(select(.kind == "TinkerbellMachineConfig" and .metadata.name == env(CLUSTER_NAME) + "-cp" ) | .spec.hardwareSelector.type) = "cp" |
(select(.kind == "TinkerbellMachineConfig" and .metadata.name == env(CLUSTER_NAME)) | .spec.hardwareSelector.type) = "worker" |
(select(.kind == "TinkerbellMachineConfig") | .spec.templateRef.kind) = "TinkerbellTemplateConfig" |
(select(.kind == "TinkerbellMachineConfig") | .spec.templateRef.name) = env(CLUSTER_NAME)
' $CLUSTER_NAME.yaml
Append the following to the $CLUSTER_NAME.yaml file.
cat << EOF >> $CLUSTER_NAME.yaml
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
name: ${CLUSTER_NAME}
spec:
template:
global_timeout: 6000
id: ""
name: ${CLUSTER_NAME}
tasks:
- actions:
- environment:
COMPRESSED: "true"
DEST_DISK: /dev/sda
IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/29/artifacts/raw/1-25/bottlerocket-v1.25.6-eks-d-1-25-7-eks-a-29-amd64.img.gz
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29
name: stream-image
timeout: 600
- environment:
CONTENTS: |
# Version is required, it will change as we support
# additional settings
version = 1
# "eno1" is the interface name
# Users may turn on dhcp4 and dhcp6 via boolean
[enp1s0f0np0]
dhcp4 = true
dhcp6 = false
# Define this interface as the "primary" interface
# for the system. This IP is what kubelet will use
# as the node IP. If none of the interfaces has
# "primary" set, we choose the first interface in
# the file
primary = true
DEST_DISK: /dev/sda12
DEST_PATH: /net.toml
DIRMODE: "0755"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29
name: write-netplan
pid: host
timeout: 90
- environment:
BOOTCONFIG_CONTENTS: |
kernel {
console = "ttyS1,115200n8"
}
DEST_DISK: /dev/sda12
DEST_PATH: /bootconfig.data
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29
name: write-bootconfig
pid: host
timeout: 90
- environment:
DEST_DISK: /dev/sda12
DEST_PATH: /user-data.toml
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
HEGEL_URLS: http://${LC_POOL_ADMIN}:50061,http://${LC_TINK_VIP}:50061
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29
name: write-user-data
pid: host
timeout: 90
- image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-29
name: reboot-image
pid: host
timeout: 90
volumes:
- /worker:/worker
name: ${CLUSTER_NAME}
volumes:
- /dev:/dev
- /dev/console:/dev/console
- /lib/firmware:/lib/firmware:ro
worker: '{{.device_1}}'
version: "0.1"
EOF
Create an EKS-A Cluster. Double check and be sure $LC_POOL_ADMIN
and $CLUSTER_NAME
are set correctly before running this (they were passed through SSH or otherwise defined in previous steps). Otherwise manually set them!
eksctl anywhere create cluster --filename $CLUSTER_NAME.yaml \
--hardware-csv hardware.csv --tinkerbell-bootstrap-ip $LC_POOL_ADMIN
eksctl anywhere
is creating the clusterWhen the command above indicates that it is Creating new workload cluster
, reboot the two nodes. This
is to force them attempt to iPXE boot from the tinkerbell stack that eksctl anywhere
command creates.
Note that this must be done without interrupting the eksctl anywhere create cluster
command.
Option 1 - You can use this command to automate it, but you'll need to be back on the original host.
node_ids=$(metal devices list -o json | jq -r '.[] | select(.hostname | startswith("eksa-node")) | .id')
for id in $(echo $node_ids); do
metal device reboot -i $id
done
Option 2 - Instead of rebooting the nodes from the host you can force the iPXE boot from your local by accessing each node's SOS console. You can retrieve the uuid and facility code of each node using the metal cli, UI Console or the Equinix Metal's API. By default, any existing ssh key in the project can be used to login.
ssh {node-uuid}@sos.{facility-code}.platformequinix.com -i </path/to/ssh-key>
You can see the below logs message if the whole process is successful.
Installing networking on workload cluster
Creating EKS-A namespace
Installing cluster-api providers on workload cluster
Installing EKS-A secrets on workload cluster
Installing resources on management cluster
Moving cluster management from bootstrap to workload cluster
Installing EKS-A custom components (CRD and controller) on workload cluster
Installing EKS-D components on workload cluster
Creating EKS-A CRDs instances on workload cluster
Installing GitOps Toolkit on workload cluster
GitOps field not specified, bootstrap flux skipped
Writing cluster config file
Deleting bootstrap cluster
:tada: Cluster created!
--------------------------------------------------------------------------------------
The Amazon EKS Anywhere Curated Packages are only available to customers with the
Amazon EKS Anywhere Enterprise Subscription
--------------------------------------------------------------------------------------
Enabling curated packages on the cluster
Installing helm chart on cluster {"chart": "eks-anywhere-packages", "version": "0.2.30-eks-a-29"}
To verify the nodes are deployed properly OR Not.
LC_POOL_ADMIN=$POOL_ADMIN LC_POOL_VIP=$POOL_VIP LC_TINK_VIP=$TINK_VIP ssh -o SendEnv=LC_POOL_ADMIN,LC_POOL_VIP,LC_TINK_VIP root@$PUB_ADMIN
cp <CLUSTER_NAME Directory>/<CLUSTER_NAME>-eks-a-cluster.kubeconfig /root/.kube/config
kubectl get nodes -A
kubectl get pods -A