aws / eks-anywhere

Run Amazon EKS on your own infrastructure 🚀
https://anywhere.eks.amazonaws.com
Apache License 2.0
1.98k stars 290 forks source link

Unable to deploy an EKS Anywhere simple cluster (Bare Metal) - hangs on "Creating new workload" step #7443

Open afikmirc opened 10 months ago

afikmirc commented 10 months ago

Hello everyone!

I'm trying to run a very simple EKS anywhere cluster and it hangs on the creation - "Creating new workload" step

My cluster YAML file:

apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: bm-anywhere-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
  controlPlaneConfiguration:
    count: 1
    endpoint:
      host: "192.168.20.199"
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: bm-anywhere-cluster-cp
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: bm-anywhere-cluster
  kubernetesVersion: "1.28"
  managementCluster:
    name: bm-anywhere-cluster
  workerNodeGroupConfigurations:
  - count: 1
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: bm-anywhere-cluster
    name: md-0

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: bm-anywhere-cluster
spec:
  tinkerbellIP: "192.168.20.200"

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: bm-anywhere-cluster-cp
spec:
  hardwareSelector:
    type: control-plan
  osFamily: bottlerocket
  templateRef: {}
  users:
  - name: ec2-user
    sshAuthorizedKeys:
    - ssh-rsa ssh-rsa <hidden string>

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: bm-anywhere-cluster
spec:
  hardwareSelector:
    type: worker
  osFamily: bottlerocket
  templateRef: {}
  users:
  - name: ec2-user
    sshAuthorizedKeys:
    - ssh-rsa <hidden string>

---

Explainmention about the IPs and blocks I provided in the file: 192.168.0.0/16 - there's no existing subnet on my network I just left it as the default 10.96.0.0/12 - there's no existing subnet on my network I just left it as the default

spec.controlPlaneConfiguration.endpoint.host - 192.168.20.199: This is an available IP on my network existing subnet (this subnet has access to the internet).

tinkerbellIP - 192.168.20.200: This is an available IP on my network existing subnet (this subnet has access to the internet)

When I'm creating the EKS Anywhere cluster it hangs.

eksctl anywhere create cluster -v=9 -f bm-anywhere-cluster.yaml -z hardware.csv

......
......
......

2024-01-31T12:26:32.523+0200    V0      Creating new workload cluster
2024-01-31T12:26:32.524+0200    V5      Adding extraArgs        {"tls-cipher-suites": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"}
2024-01-31T12:26:32.525+0200    V5      Adding extraArgs        {"tls-cipher-suites": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"}
2024-01-31T12:26:32.525+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-01-31T12:26:32.525+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 kubectl apply -f - --namespace eksa-system --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-01-31T12:26:32.904+0200    V5      Retry execution successful      {"retries": 1, "duration": "378.235373ms"}
2024-01-31T12:26:32.904+0200    V3      Waiting for control plane to be available
2024-01-31T12:26:32.904+0200    V5      Retrier:        {"timeout": "1h0m0s", "backoffFactor": null}
2024-01-31T12:26:32.904+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 kubectl wait --timeout 3600.00s --for=condition=ControlPlaneInitialized clusters.cluster.x-k8s.io/bm-anywhere-cluster --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n eksa-system"}
2024-01-31T13:26:33.096+0200    V9      docker  {"stderr": "error: timed out waiting for the condition on clusters/bm-anywhere-cluster\n"}
2024-01-31T13:26:33.096+0200    V5      Error happened during retry     {"error": "executing wait: error: timed out waiting for the condition on clusters/bm-anywhere-cluster\n", "retries": 1}
2024-01-31T13:26:33.096+0200    V5      Execution aborted by retry policy
2024-01-31T13:26:33.096+0200    V4      Task finished   {"task_name": "workload-cluster-init", "duration": "1h0m0.573792207s"}
2024-01-31T13:26:33.096+0200    V4      ----------------------------------
2024-01-31T13:26:33.096+0200    V4      Task start      {"task_name": "collect-cluster-diagnostics"}
2024-01-31T13:26:33.096+0200    V0      collecting cluster diagnostics
2024-01-31T13:26:33.096+0200    V0      collecting management cluster diagnostics
2024-01-31T13:26:33.107+0200    V3      bundle config written   {"path": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-01-31T13:26:33+02:00-bundle.yaml"}
2024-01-31T13:26:33.107+0200    V1      creating temporary namespace for diagnostic collector   {"namespace": "eksa-diagnostics"}
2024-01-31T13:26:33.107+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-01-31T13:26:33.107+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 kubectl create namespace eksa-diagnostics --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-01-31T13:26:33.263+0200    V5      Retry execution successful      {"retries": 1, "duration": "155.703042ms"}
2024-01-31T13:26:33.263+0200    V1      creating temporary ClusterRole and RoleBinding for diagnostic collector
2024-01-31T13:26:33.263+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-01-31T13:26:33.263+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 kubectl apply -f - --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-01-31T13:26:33.510+0200    V5      Retry execution successful      {"retries": 1, "duration": "246.957482ms"}
2024-01-31T13:26:33.510+0200    V0      ⏳ Collecting support bundle from cluster, this can take a while        {"cluster": "bootstrap-cluster", "bundle": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-01-31T13:26:33+02:00-bundle.yaml", "since": "2024-01-31T10:26:33.107+0200", "kubeconfig": "bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-01-31T13:26:33.510+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 support-bundle bm-anywhere-cluster/generated/bootstrap-cluster-2024-01-31T13:26:33+02:00-bundle.yaml --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig --interactive=false --since-time 2024-01-31T10:26:33.107500907+02:00"}
2024-01-31T13:26:40.926+0200    V0      Support bundle archive created  {"path": "support-bundle-2024-01-31T11_26_33.tar.gz"}
2024-01-31T13:26:40.926+0200    V0      Analyzing support bundle        {"bundle": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-01-31T13:26:33+02:00-bundle.yaml", "archive": "support-bundle-2024-01-31T11_26_33.tar.gz"}
2024-01-31T13:26:40.926+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 support-bundle analyze bm-anywhere-cluster/generated/bootstrap-cluster-2024-01-31T13:26:33+02:00-bundle.yaml --bundle support-bundle-2024-01-31T11_26_33.tar.gz --output json"}
2024-01-31T13:26:41.211+0200    V0      Analysis output generated       {"path": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-01-31T13:26:41+02:00-analysis.yaml"}
2024-01-31T13:26:41.211+0200    V1      cleaning up temporary roles for diagnostic collectors
2024-01-31T13:26:41.211+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-01-31T13:26:41.211+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 kubectl delete -f - --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-01-31T13:26:41.374+0200    V5      Retry execution successful      {"retries": 1, "duration": "163.248996ms"}
2024-01-31T13:26:41.374+0200    V1      cleaning up temporary namespace  for diagnostic collectors      {"namespace": "eksa-diagnostics"}
2024-01-31T13:26:41.374+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-01-31T13:26:41.374+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1706696683359269153 kubectl delete namespace eksa-diagnostics --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-01-31T13:26:46.905+0200    V5      Retry execution successful      {"retries": 1, "duration": "5.530672342s"}
2024-01-31T13:26:46.905+0200    V0      collecting workload cluster diagnostics
2024-01-31T13:26:46.905+0200    V4      Task finished   {"task_name": "collect-cluster-diagnostics", "duration": "13.808803162s"}
2024-01-31T13:26:46.905+0200    V4      ----------------------------------
2024-01-31T13:26:46.905+0200    V4      Saving checkpoint       {"file": "bm-anywhere-cluster-checkpoint.yaml"}
2024-01-31T13:26:46.906+0200    V4      Tasks completed {"duration": "1h2m1.226833399s"}
2024-01-31T13:26:46.906+0200    V3      Cleaning up long running container      {"name": "eksa_1706696683359269153"}
2024-01-31T13:26:46.906+0200    V6      Executing command       {"cmd": "/usr/bin/docker rm -f -v eksa_1706696683359269153"}
Error: waiting for control plane to be ready: executing wait: executing wait: error: timed out waiting for the condition on clusters/bm-anywhere-cluster

Docker boots logs shows:

docker logs -f boots
{"level":"info","ts":1706696738.415058,"caller":"boots/main.go:119","msg":"starting","service":"github.com/tinkerbell/boots","pkg":"main","version":"8fd5c38"}
{"level":"info","ts":1706696738.4281049,"caller":"boots/main.go:186","msg":"serving iPXE binaries from local HTTP server","service":"github.com/tinkerbell/boots","pkg":"main","addr":"192.168.20.116/ipxe/"}
{"level":"info","ts":1706696738.4281282,"caller":"boots/main.go:128","msg":"serving syslog","service":"github.com/tinkerbell/boots","pkg":"main","addr":"192.168.20.116:514"}
{"level":"info","ts":1706696738.4281378,"caller":"boots/main.go:205","msg":"serving dhcp","service":"github.com/tinkerbell/boots","pkg":"main","addr":"0.0.0.0:67"}
{"level":"info","ts":1706696738.4281623,"caller":"boots/main.go:212","msg":"serving http","service":"github.com/tinkerbell/boots","pkg":"main","addr":"192.168.20.116:80"}
{"level":"info","ts":1706696738.4282918,"logger":"github.com/tinkerbell/ipxedust","caller":"ipxedust@v0.0.0-20230118215055-b00d1b371ddf/ipxedust.go:194","msg":"serving iPXE binaries via TFTP","service":"github.com/tinkerbell/boots","addr":"0.0.0.0:69","timeout":5,"singlePortEnabled":true}
{"level":"info","ts":1706699115.5887766,"caller":"dhcp4-go@v0.0.0-20190402165401-39c137f31ad3/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"00:50:b6:ec:9a:53","via":"0.0.0.0","iface":"eno2","xid":"\"84:13:be:6b\"","type":"DHCPREQUEST"}
{"level":"info","ts":1706699115.5889626,"caller":"boots/dhcp.go:88","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"00:50:b6:ec:9a:53","circuitID":""}
{"level":"error","ts":1706699115.5893145,"caller":"boots/dhcp.go:101","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPREQUEST","mac":"00:50:b6:ec:9a:53","error":"discover from dhcp message: no hardware found","errorVerbose":"no hardware found\ngithub.com/tinkerbell/boots/client/kubernetes.(*Finder).ByMAC\n\tgithub.com/tinkerbell/boots/client/kubernetes/hardware_finder.go:96\ngithub.com/tinkerbell/boots/job.(*Creator).CreateFromDHCP\n\tgithub.com/tinkerbell/boots/job/job.go:107\nmain.dhcpHandler.serve\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:99\nmain.dhcpHandler.ServeDHCP.func1\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:60\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\tgithub.com/gammazero/workerpool@v0.0.0-20200311205957-7b00833861c6/workerpool.go:169\nruntime.goexit\n\truntime/asm_amd64.s:1571\ndiscover from dhcp message"}
ahreehong commented 10 months ago

Hello, can you verify that this hardware selector type should be control-plan and not control-plane?

spec:
  hardwareSelector:
    type: control-plan

Could you also run kubectl get hardware -A --show-labels and kubectl get workflows -A and post the output here?

afikmirc commented 10 months ago

Hi ahreehong, thank you for the response!

It was control-plan on the YAML file and the hardware.csv file and I changed it to control-plane on both files. However, I'm still getting the same error when I'm creating the cluster.

This is the output you were asked about:

$ kubectl get hardware -A --show-labels
NAMESPACE     NAME   STATE   LABELS
eksa-system   u3             type=control-plane,v1alpha1.tinkerbell.org/ownerName=bm-anywhere-cluster-control-plane-template-1706778218833-zd49r,v1alpha1.tinkerbell.org/ownerNamespace=eksa-system
eksa-system   u4             type=worker

$ kubectl get workflows -A
NAMESPACE     NAME                                                             TEMPLATE                                                         STATE
eksa-system   bm-anywhere-cluster-control-plane-template-1706778218833-zd49r   bm-anywhere-cluster-control-plane-template-1706778218833-zd49r   STATE_PENDING

My hardware.csv file is:
$ cat hardware.csv
hostname,mac,ip_address,gateway,netmask,nameservers,labels,disk
u4,<hidden mac>,192.168.20.122,192.168.20.254,255.255.255.0,8.8.8.8,type=worker,/dev/sda1
u3,<hidden mac>,192.168.20.149,192.168.20.254,255.255.255.0,8.8.8.8,type=control-plane,/dev/sda1

Any suggestion?

terry-hasegawa commented 10 months ago

Hi @afikmirc , Your Boots log only has a MAC address of 00:xxx:53. However, I think this MAC address is not included in hardware.csv (no hardware found).

Is this MAC address for control plane? if yes, you should add it to hardware.csv. Is the control plne server on PXE booting? You need to PXE boot manually without eks-a because hardware.csv does not have BMC information.

afikmirc commented 10 months ago

Hi @terry-hasegawa, thank you for the response!

Yes, the hardware file MACs were wrong. We did change the hardware.csv and the config file.

I'll post here the content of both with some logs, please let us know how we could get more logs (if it's by running commands or taking a look in log files).

hardware.csv:

hostname,mac,ip_address,gateway,netmask,nameservers,labels,disk
u4,<hidden mac>,192.168.55.122,192.168.55.254,255.255.255.0,8.8.8.8,node=worker,/dev/sda2
u3,<hidden mac>,192.168.55.149,192.168.55.254,255.255.255.0,8.8.8.8,node=cp-machine,/dev/nvme0n1p2

YAML config

$ cat bm-anywhere-cluster.yaml
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: bm-anywhere-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
  controlPlaneConfiguration:
    count: 1
    endpoint:
      host: "192.168.55.199"
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: bm-anywhere-cluster-cp
  datacenterRef:
    kind: TinkerbellDatacenterConfig
    name: bm-anywhere-cluster
  kubernetesVersion: "1.28"
  managementCluster:
    name: bm-anywhere-cluster
  workerNodeGroupConfigurations:
  - count: 1
    machineGroupRef:
      kind: TinkerbellMachineConfig
      name: bm-anywhere-cluster
    name: md-0

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
  name: bm-anywhere-cluster
spec:
  tinkerbellIP: "192.168.55.200"

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: bm-anywhere-cluster-cp
spec:
  hardwareSelector:
    node: cp-machine
  osFamily: bottlerocket
  templateRef: {}
  users:
  - name: user
    sshAuthorizedKeys:
    - ssh-rsa <hidden string>

---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
  name: bm-anywhere-cluster
spec:
  hardwareSelector:
    node: worker
  osFamily: bottlerocket
  templateRef: {}
  users:
  - name: user
    sshAuthorizedKeys:
    - ssh-rsa <hidden string>
---

Creating cluster with verbose 9:

$ eksctl anywhere create cluster -v=9 -f bm-anywhere-cluster.yaml -z hardware.csv
2024-02-06T16:21:48.353+0200    V0      Warning: The recommended number of control plane nodes is 3 or 5
2024-02-06T16:21:48.353+0200    V6      Executing command       {"cmd": "/usr/bin/docker version --format {{.Client.Version}}"}
2024-02-06T16:21:48.367+0200    V6      Executing command       {"cmd": "/usr/bin/docker info --format '{{json .MemTotal}}'"}
2024-02-06T16:21:48.528+0200    V4      Reading bundles manifest        {"url": "https://anywhere-assets.eks.amazonaws.com/releases/bundles/57/manifest.yaml"}
2024-02-06T16:21:48.574+0200    V4      Using CAPI provider versions    {"Core Cluster API": "v1.5.2+0f93bc6", "Kubeadm Bootstrap": "v1.5.2+ff11797", "Kubeadm Control Plane": "v1.5.2+72f2d97", "External etcd Bootstrap": "v1.0.10+d1f944b", "External etcd Controller": "v1.0.16+d73980e", "Cluster API Provider Tinkerbell": "v0.4.0+2823545"}
2024-02-06T16:21:48.714+0200    V0      Warning: The recommended number of control plane nodes is 3 or 5
2024-02-06T16:21:48.714+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T16:21:48.714+0200    V2      Pulling docker image    {"image": "public.ecr.aws/eks-anywhere/cli-tools:v0.18.5-eks-a-57"}
2024-02-06T16:21:48.714+0200    V6      Executing command       {"cmd": "/usr/bin/docker pull public.ecr.aws/eks-anywhere/cli-tools:v0.18.5-eks-a-57"}
2024-02-06T16:21:50.577+0200    V5      Retry execution successful      {"retries": 1, "duration": "1.862730351s"}
2024-02-06T16:21:50.577+0200    V3      Initializing long running container     {"name": "eksa_1707229308714880618", "image": "public.ecr.aws/eks-anywhere/cli-tools:v0.18.5-eks-a-57"}
2024-02-06T16:21:50.578+0200    V6      Executing command       {"cmd": "/usr/bin/docker run -d --name eksa_1707229308714880618 --network host -w /home/user/original-guide/4 -v /var/run/docker.sock:/var/run/docker.sock -v /home/user/original-guide/4:/home/user/original-guide/4 -v /home/user/original-guide/4:/home/user/original-guide/4 --entrypoint sleep public.ecr.aws/eks-anywhere/cli-tools:v0.18.5-eks-a-57 infinity"}
2024-02-06T16:21:50.824+0200    V4      Inferring local Tinkerbell Bootstrap IP from environment
2024-02-06T16:21:50.824+0200    V4      Tinkerbell IP   {"tinkerbell-ip": "192.168.55.116"}
2024-02-06T16:21:50.824+0200    V4      Task start      {"task_name": "setup-validate"}
2024-02-06T16:21:50.824+0200    V0      Performing setup and validations
2024-02-06T16:21:50.824+0200    V6      Executing command       {"cmd": "/usr/bin/docker container inspect boots"}
2024-02-06T16:21:50.835+0200    V9      docker  {"stderr": "Error response from daemon: No such container: boots\n"}
2024-02-06T16:21:51.838+0200    V0      âś… Tinkerbell Provider setup is valid
2024-02-06T16:21:51.838+0200    V0      âś… Validate OS is compatible with registry mirror configuration
2024-02-06T16:21:51.838+0200    V0      âś… Validate certificate for registry mirror
2024-02-06T16:21:51.838+0200    V0      âś… Validate authentication for git provider
2024-02-06T16:21:51.838+0200    V0      âś… Validate cluster's eksaVersion matches EKS-A version
2024-02-06T16:21:51.838+0200    V4      Task finished   {"task_name": "setup-validate", "duration": "1.014177661s"}
2024-02-06T16:21:51.838+0200    V4      ----------------------------------
2024-02-06T16:21:51.838+0200    V4      Task start      {"task_name": "bootstrap-cluster-init"}
2024-02-06T16:21:51.838+0200    V0      Creating new bootstrap cluster
2024-02-06T16:21:51.839+0200    V4      Creating kind cluster   {"name": "bm-anywhere-cluster-eks-a-cluster", "kubeconfig": "bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T16:21:51.839+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kind create cluster --name bm-anywhere-cluster-eks-a-cluster --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig --image public.ecr.aws/eks-anywhere/kubernetes-sigs/kind/node:v1.28.4-eks-d-1-28-13-eks-a-57 --config bm-anywhere-cluster/generated/kind_tmp.yaml"}
2024-02-06T16:22:05.857+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T16:22:05.857+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl get namespace eksa-system --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T16:22:06.022+0200    V9      docker  {"stderr": "Error from server (NotFound): namespaces \"eksa-system\" not found\n"}
2024-02-06T16:22:06.022+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl create namespace eksa-system --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T16:22:06.169+0200    V5      Retry execution successful      {"retries": 1, "duration": "312.378827ms"}
2024-02-06T16:22:06.169+0200    V0      Provider specific pre-capi-install-setup on bootstrap cluster
2024-02-06T16:22:06.169+0200    V4      Installing Tinkerbell stack on bootstrap cluster
2024-02-06T16:22:06.169+0200    V6      Installing Tinkerbell helm chart
2024-02-06T16:22:06.169+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i -e HELM_EXPERIMENTAL_OCI=1 -e NO_PROXY= -e HTTPS_PROXY= -e HTTP_PROXY= eksa_1707229308714880618 helm upgrade --install tinkerbell-chart oci://public.ecr.aws/eks-anywhere/tinkerbell/tinkerbell-chart --version 0.2.4-eks-a-57 --values bm-anywhere-cluster/generated/tinkerbell-chart-overrides.yaml --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig --wait"}
2024-02-06T16:22:40.527+0200    V6      Executing command       {"cmd": "/usr/bin/docker run -d -i -v /home/user/original-guide/4/bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig:/kubeconfig --network host -e PUBLIC_IP=192.168.55.116 -e PUBLIC_SYSLOG_IP=192.168.55.116 -e BOOTS_KUBE_NAMESPACE=eksa-system -e DATA_MODEL_VERSION=kubernetes -e TINKERBELL_TLS=false -e TINKERBELL_GRPC_AUTHORITY=192.168.55.116:42113 -e BOOTS_EXTRA_KERNEL_ARGS=tink_worker_image=public.ecr.aws/eks-anywhere/tinkerbell/tink/tink-worker:v0.8.0-eks-a-57 --name boots public.ecr.aws/eks-anywhere/tinkerbell/boots:v0.8.1-eks-a-57 -kubeconfig /kubeconfig -dhcp-addr 0.0.0.0:67 -osie-path-override https://anywhere-assets.eks.amazonaws.com/releases/bundles/57/artifacts/hook/9d54933a03f2f4c06322969b06caa18702d17f66"}
2024-02-06T16:22:41.903+0200    V0      Installing cluster-api providers on bootstrap cluster
2024-02-06T16:22:49.086+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i -e TINKERBELL_IP=IGNORED -e KUBEADM_BOOTSTRAP_TOKEN_TTL=120m eksa_1707229308714880618 clusterctl init --core cluster-api:v1.5.2+0f93bc6 --bootstrap kubeadm:v1.5.2+ff11797 --control-plane kubeadm:v1.5.2+72f2d97 --infrastructure tinkerbell:v0.4.0+2823545 --config bm-anywhere-cluster/generated/clusterctl_tmp.yaml --bootstrap etcdadm-bootstrap:v1.0.10+d1f944b --bootstrap etcdadm-controller:v1.0.16+d73980e --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T16:23:08.679+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:08.679+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/capi-kubeadm-bootstrap-controller-manager --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n capi-kubeadm-bootstrap-system"}
2024-02-06T16:23:20.954+0200    V5      Retry execution successful      {"retries": 1, "duration": "12.274894734s"}
2024-02-06T16:23:20.954+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:20.954+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/capi-kubeadm-control-plane-controller-manager --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n capi-kubeadm-control-plane-system"}
2024-02-06T16:23:29.005+0200    V5      Retry execution successful      {"retries": 1, "duration": "8.051081472s"}
2024-02-06T16:23:29.005+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:29.005+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/capi-controller-manager --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n capi-system"}
2024-02-06T16:23:29.264+0200    V5      Retry execution successful      {"retries": 1, "duration": "259.294211ms"}
2024-02-06T16:23:29.265+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:29.265+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/cert-manager --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n cert-manager"}
2024-02-06T16:23:29.558+0200    V5      Retry execution successful      {"retries": 1, "duration": "293.641ms"}
2024-02-06T16:23:29.558+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:29.558+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/cert-manager-cainjector --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n cert-manager"}
2024-02-06T16:23:29.840+0200    V5      Retry execution successful      {"retries": 1, "duration": "281.816935ms"}
2024-02-06T16:23:29.840+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:29.840+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/cert-manager-webhook --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n cert-manager"}
2024-02-06T16:23:30.085+0200    V5      Retry execution successful      {"retries": 1, "duration": "244.775925ms"}
2024-02-06T16:23:30.085+0200    V5      Retrier:        {"timeout": "30m0s", "backoffFactor": null}
2024-02-06T16:23:30.085+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 1800.00s --for=condition=Available deployments/capt-controller-manager --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n capt-system"}
2024-02-06T16:23:35.036+0200    V5      Retry execution successful      {"retries": 1, "duration": "4.950310288s"}
2024-02-06T16:23:35.036+0200    V0      Provider specific post-setup
2024-02-06T16:23:35.038+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl apply -f - --force --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T16:23:35.853+0200    V4      Task finished   {"task_name": "bootstrap-cluster-init", "duration": "1m44.01464722s"}
2024-02-06T16:23:35.853+0200    V4      ----------------------------------
2024-02-06T16:23:35.853+0200    V4      Task start      {"task_name": "workload-cluster-init"}
2024-02-06T16:23:35.853+0200    V0      Creating new workload cluster
2024-02-06T16:23:35.854+0200    V5      Adding extraArgs        {"tls-cipher-suites": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"}
2024-02-06T16:23:35.855+0200    V5      Adding extraArgs        {"tls-cipher-suites": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"}
2024-02-06T16:23:35.855+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T16:23:35.855+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl apply -f - --namespace eksa-system --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T16:23:36.166+0200    V5      Retry execution successful      {"retries": 1, "duration": "310.193335ms"}
2024-02-06T16:23:36.166+0200    V3      Waiting for control plane to be available
2024-02-06T16:23:36.166+0200    V5      Retrier:        {"timeout": "1h0m0s", "backoffFactor": null}
2024-02-06T16:23:36.166+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl wait --timeout 3600.00s --for=condition=ControlPlaneInitialized clusters.cluster.x-k8s.io/bm-anywhere-cluster --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig -n eksa-system"}
2024-02-06T17:23:36.344+0200    V9      docker  {"stderr": "error: timed out waiting for the condition on clusters/bm-anywhere-cluster\n"}
2024-02-06T17:23:36.344+0200    V5      Error happened during retry     {"error": "executing wait: error: timed out waiting for the condition on clusters/bm-anywhere-cluster\n", "retries": 1}
2024-02-06T17:23:36.344+0200    V5      Execution aborted by retry policy
2024-02-06T17:23:36.344+0200    V4      Task finished   {"task_name": "workload-cluster-init", "duration": "1h0m0.491246572s"}
2024-02-06T17:23:36.344+0200    V4      ----------------------------------
2024-02-06T17:23:36.344+0200    V4      Task start      {"task_name": "collect-cluster-diagnostics"}
2024-02-06T17:23:36.344+0200    V0      collecting cluster diagnostics
2024-02-06T17:23:36.344+0200    V0      collecting management cluster diagnostics
2024-02-06T17:23:36.356+0200    V3      bundle config written   {"path": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-02-06T17:23:36+02:00-bundle.yaml"}
2024-02-06T17:23:36.356+0200    V1      creating temporary namespace for diagnostic collector   {"namespace": "eksa-diagnostics"}
2024-02-06T17:23:36.356+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T17:23:36.356+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl create namespace eksa-diagnostics --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T17:23:36.927+0200    V5      Retry execution successful      {"retries": 1, "duration": "571.578018ms"}
2024-02-06T17:23:36.927+0200    V1      creating temporary ClusterRole and RoleBinding for diagnostic collector
2024-02-06T17:23:36.927+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T17:23:36.927+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl apply -f - --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T17:23:37.569+0200    V5      Retry execution successful      {"retries": 1, "duration": "641.636662ms"}
2024-02-06T17:23:37.569+0200    V0      ⏳ Collecting support bundle from cluster, this can take a while        {"cluster": "bootstrap-cluster", "bundle": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-02-06T17:23:36+02:00-bundle.yaml", "since": "2024-02-06T14:23:36.356+0200", "kubeconfig": "bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T17:23:37.569+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 support-bundle bm-anywhere-cluster/generated/bootstrap-cluster-2024-02-06T17:23:36+02:00-bundle.yaml --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig --interactive=false --since-time 2024-02-06T14:23:36.356076127+02:00"}
2024-02-06T17:23:56.446+0200    V0      Support bundle archive created  {"path": "support-bundle-2024-02-06T15_23_38.tar.gz"}
2024-02-06T17:23:56.446+0200    V0      Analyzing support bundle        {"bundle": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-02-06T17:23:36+02:00-bundle.yaml", "archive": "support-bundle-2024-02-06T15_23_38.tar.gz"}
2024-02-06T17:23:56.446+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 support-bundle analyze bm-anywhere-cluster/generated/bootstrap-cluster-2024-02-06T17:23:36+02:00-bundle.yaml --bundle support-bundle-2024-02-06T15_23_38.tar.gz --output json"}
2024-02-06T17:23:57.602+0200    V0      Analysis output generated       {"path": "bm-anywhere-cluster/generated/bootstrap-cluster-2024-02-06T17:23:57+02:00-analysis.yaml"}
2024-02-06T17:23:57.602+0200    V1      cleaning up temporary roles for diagnostic collectors
2024-02-06T17:23:57.602+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T17:23:57.602+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl delete -f - --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T17:23:58.151+0200    V5      Retry execution successful      {"retries": 1, "duration": "549.615569ms"}
2024-02-06T17:23:58.152+0200    V1      cleaning up temporary namespace  for diagnostic collectors      {"namespace": "eksa-diagnostics"}
2024-02-06T17:23:58.152+0200    V5      Retrier:        {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-02-06T17:23:58.152+0200    V6      Executing command       {"cmd": "/usr/bin/docker exec -i eksa_1707229308714880618 kubectl delete namespace eksa-diagnostics --kubeconfig bm-anywhere-cluster/generated/bm-anywhere-cluster.kind.kubeconfig"}
2024-02-06T17:24:04.266+0200    V5      Retry execution successful      {"retries": 1, "duration": "6.114617963s"}
2024-02-06T17:24:04.266+0200    V0      collecting workload cluster diagnostics
2024-02-06T17:24:04.266+0200    V4      Task finished   {"task_name": "collect-cluster-diagnostics", "duration": "27.922365477s"}
2024-02-06T17:24:04.266+0200    V4      ----------------------------------
2024-02-06T17:24:04.266+0200    V4      Saving checkpoint       {"file": "bm-anywhere-cluster-checkpoint.yaml"}
2024-02-06T17:24:04.267+0200    V4      Tasks completed {"duration": "1h2m13.443118168s"}
2024-02-06T17:24:04.267+0200    V3      Cleaning up long running container      {"name": "eksa_1707229308714880618"}
2024-02-06T17:24:04.267+0200    V6      Executing command       {"cmd": "/usr/bin/docker rm -f -v eksa_1707229308714880618"}
Error: waiting for control plane to be ready: executing wait: executing wait: error: timed out waiting for the condition on clusters/bm-anywhere-cluster
$ docker logs boots
{"level":"info","ts":1707229362.0011525,"caller":"boots/main.go:119","msg":"starting","service":"github.com/tinkerbell/boots","pkg":"main","version":"8fd5c38"}
{"level":"info","ts":1707229362.0142574,"caller":"boots/main.go:186","msg":"serving iPXE binaries from local HTTP server","service":"github.com/tinkerbell/boots","pkg":"main","addr":"192.168.55.116/ipxe/"}
{"level":"info","ts":1707229362.0142915,"caller":"boots/main.go:205","msg":"serving dhcp","service":"github.com/tinkerbell/boots","pkg":"main","addr":"0.0.0.0:67"}
{"level":"info","ts":1707229362.0143209,"caller":"boots/main.go:212","msg":"serving http","service":"github.com/tinkerbell/boots","pkg":"main","addr":"192.168.55.116:80"}
{"level":"info","ts":1707229362.0143735,"caller":"boots/main.go:128","msg":"serving syslog","service":"github.com/tinkerbell/boots","pkg":"main","addr":"192.168.55.116:514"}
{"level":"info","ts":1707229362.0145123,"logger":"github.com/tinkerbell/ipxedust","caller":"ipxedust@v0.0.0-20230118215055-b00d1b371ddf/ipxedust.go:194","msg":"serving iPXE binaries via TFTP","service":"github.com/tinkerbell/boots","addr":"0.0.0.0:69","timeout":5,"singlePortEnabled":true}
{"level":"info","ts":1707230370.9840763,"caller":"dhcp4-go@v0.0.0-20190402165401-39c137f31ad3/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"08:3a:88:5c:db:17","via":"0.0.0.0","iface":"eno2","xid":"\"3f:5a:29:fd\"","type":"DHCPREQUEST","secs":768,"option(81)":"AAAASVNSLVBGNFNNREdI"}
{"level":"info","ts":1707230370.9843452,"caller":"boots/dhcp.go:88","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"08:3a:88:5c:db:17","circuitID":""}
{"level":"error","ts":1707230370.9847584,"caller":"boots/dhcp.go:101","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPREQUEST","mac":"08:3a:88:5c:db:17","error":"discover from dhcp message: no hardware found","errorVerbose":"no hardware found\ngithub.com/tinkerbell/boots/client/kubernetes.(*Finder).ByMAC\n\tgithub.com/tinkerbell/boots/client/kubernetes/hardware_finder.go:96\ngithub.com/tinkerbell/boots/job.(*Creator).CreateFromDHCP\n\tgithub.com/tinkerbell/boots/job/job.go:107\nmain.dhcpHandler.serve\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:99\nmain.dhcpHandler.ServeDHCP.func1\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:60\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\tgithub.com/gammazero/workerpool@v0.0.0-20200311205957-7b00833861c6/workerpool.go:169\nruntime.goexit\n\truntime/asm_amd64.s:1571\ndiscover from dhcp message"}
{"level":"info","ts":1707235000.6043785,"caller":"dhcp4-go@v0.0.0-20190402165401-39c137f31ad3/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"08:3a:88:61:ca:d4","via":"0.0.0.0","iface":"eno2","xid":"\"75:50:3e:7f\"","type":"DHCPREQUEST","option(81)":"AAAASVNSLVBGMlI2M1dR"}
{"level":"info","ts":1707235000.6046324,"caller":"boots/dhcp.go:88","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"08:3a:88:61:ca:d4","circuitID":""}
{"level":"error","ts":1707235000.6049747,"caller":"boots/dhcp.go:101","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPREQUEST","mac":"08:3a:88:61:ca:d4","error":"discover from dhcp message: no hardware found","errorVerbose":"no hardware found\ngithub.com/tinkerbell/boots/client/kubernetes.(*Finder).ByMAC\n\tgithub.com/tinkerbell/boots/client/kubernetes/hardware_finder.go:96\ngithub.com/tinkerbell/boots/job.(*Creator).CreateFromDHCP\n\tgithub.com/tinkerbell/boots/job/job.go:107\nmain.dhcpHandler.serve\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:99\nmain.dhcpHandler.ServeDHCP.func1\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:60\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\tgithub.com/gammazero/workerpool@v0.0.0-20200311205957-7b00833861c6/workerpool.go:169\nruntime.goexit\n\truntime/asm_amd64.s:1571\ndiscover from dhcp message"}
$ docker logs bm-anywhere-cluster-eks-a-cluster-control-plane
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v1
WARN: cgroupns not enabled! Please use cgroup v2, or cgroup v1 with cgroupns enabled.
INFO: fixing cgroup mounts for all subsystems
INFO: removing misc controller
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: faking /sys/class/dmi/id/product_uuid to be random
INFO: faking /sys/devices/virtual/dmi/id/product_uuid as well
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 172.18.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
systemd 252.16-1.amzn2023.0.1 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 -BZIP2 -LZ4 +XZ -ZLIB -ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.

Welcome to Amazon Linux 2023!

Failed to open libbpf, cgroup BPF features disabled: Operation not supported
Queued start job for default target graphical.target.
[  OK  ] Created slice kubelet.slic… used to run Kubernetes / Kubelet.
[  OK  ] Created slice system-modpr…lice - Slice /system/modprobe.
[  OK  ] Created slice user.slice - User and Session Slice.
[  OK  ] Started systemd-ask-passwo…quests to Console Directory Watch.
[  OK  ] Started systemd-ask-passwo… Requests to Wall Directory Watch.
[  OK  ] Reached target local-fs.target - Local File Systems.
[  OK  ] Reached target network-online.target - Network is Online.
[  OK  ] Reached target paths.target - Path Units.
[  OK  ] Reached target slices.target - Slice Units.
[  OK  ] Reached target swap.target - Swaps.
[  OK  ] Listening on systemd-initc… initctl Compatibility Named Pipe.
[  OK  ] Listening on systemd-journ…socket - Journal Audit Socket.
[  OK  ] Listening on systemd-journ…t - Journal Socket (/dev/log).
[  OK  ] Listening on systemd-journald.socket - Journal Socket.
[  OK  ] Listening on systemd-userd…0m - User Database Manager Socket.
         Mounting dev-hugepages.mount - Huge Pages File System...
         Mounting sys-kernel-debug.… - Kernel Debug File System...
         Mounting sys-kernel-tracin… - Kernel Trace File System...
         Starting ldconfig.service - Rebuild Dynamic Linker Cache...
         Starting modprobe@configfs…m - Load Kernel Module configfs...
         Starting modprobe@drm.service - Load Kernel Module drm...
         Starting modprobe@fuse.ser…e - Load Kernel Module fuse...
         Starting nfs-convert.servi…ss NFS configuration convertion...
         Starting systemd-journald.service - Journal Service...
         Starting systemd-network-g… units from Kernel command line...
         Starting systemd-sysusers.…rvice - Create System Users...
[  OK  ] Mounted dev-hugepages.mount - Huge Pages File System.
[  OK  ] Mounted sys-kernel-debug.m…nt - Kernel Debug File System.
[  OK  ] Mounted sys-kernel-tracing…nt - Kernel Trace File System.
[  OK  ] Finished ldconfig.service - Rebuild Dynamic Linker Cache.
modprobe@configfs.service: Deactivated successfully.
[  OK  ] Finished modprobe@configfs…[0m - Load Kernel Module configfs.
modprobe@drm.service: Deactivated successfully.
[  OK  ] Finished modprobe@drm.service - Load Kernel Module drm.
modprobe@fuse.service: Deactivated successfully.
[  OK  ] Finished modprobe@fuse.service - Load Kernel Module fuse.
nfs-convert.service: Deactivated successfully.
[  OK  ] Finished nfs-convert.servi…cess NFS configuration convertion.
[  OK  ] Finished systemd-network-g…rk units from Kernel command line.
[  OK  ] Reached target network-pre…get - Preparation for Network.
         Mounting sys-fs-fuse-conne… - FUSE Control File System...
         Mounting sys-kernel-config…ernel Configuration File System...
         Starting rpc-statd-notify.…- Notify NFS peers of a restart...
[  OK  ] Started systemd-journald.service - Journal Service.
[FAILED] Failed to start systemd-sy…service - Create System Users.
See 'systemctl status systemd-sysusers.service' for details.
[  OK  ] Mounted sys-fs-fuse-connec…nt - FUSE Control File System.
[  OK  ] Mounted sys-kernel-config.… Kernel Configuration File System.
[  OK  ] Started rpc-statd-notify.s…m - Notify NFS peers of a restart.
         Starting systemd-journal-f…h Journal to Persistent Storage...
[  OK  ] Finished systemd-journal-f…ush Journal to Persistent Storage.
         Starting systemd-tmpfiles-… Volatile Files and Directories...
[  OK  ] Finished systemd-tmpfiles-…te Volatile Files and Directories.
         Starting systemd-journal-c…e - Rebuild Journal Catalog...
         Starting systemd-update-ut…rd System Boot/Shutdown in UTMP...
[  OK  ] Finished systemd-update-ut…cord System Boot/Shutdown in UTMP.
[  OK  ] Finished systemd-journal-c…ice - Rebuild Journal Catalog.
         Starting systemd-update-do…rvice - Update is Completed...
[  OK  ] Finished systemd-update-do…service - Update is Completed.
[  OK  ] Reached target sysinit.target - System Initialization.
[  OK  ] Started systemd-tmpfiles-c… Cleanup of Temporary Directories.
[  OK  ] Reached target timers.target - Timer Units.
[  OK  ] Listening on dbus.socket- D-Bus System Message Bus Socket.
[  OK  ] Reached target sockets.target - Socket Units.
[  OK  ] Reached target basic.target - Basic System.
         Starting gssproxy.service - GSSAPI Proxy Daemon...
         Starting systemd-logind.se…ice - User Login Management...
         Starting undo-mount-hacks.…ice - Undo KIND mount hacks...
[  OK  ] Started gssproxy.service - GSSAPI Proxy Daemon.
[  OK  ] Finished undo-mount-hacks.…rvice - Undo KIND mount hacks.
         Starting containerd.servic… - containerd container runtime...
[  OK  ] Started containerd.service…0m - containerd container runtime.
         Starting dbus-broker.servi… - D-Bus System Message Bus...
[  OK  ] Started dbus-broker.service - D-Bus System Message Bus.
[  OK  ] Started systemd-logind.service - User Login Management.
         Mounting var-lib-nfs-rpc_p…ount - RPC Pipe File System...
[  OK  ] Mounted var-lib-nfs-rpc_pi….mount - RPC Pipe File System.
[  OK  ] Reached target rpc_pipefs.target.
[  OK  ] Reached target nfs-client.target - NFS client services.
[  OK  ] Reached target remote-fs-p…eparation for Remote File Systems.
[  OK  ] Reached target remote-fs.target - Remote File Systems.
         Starting systemd-user-sess…vice - Permit User Sessions...
[  OK  ] Finished systemd-user-sess…ervice - Permit User Sessions.
[  OK  ] Started console-getty.service - Console Getty.
[  OK  ] Reached target getty.target - Login Prompts.
[  OK  ] Reached target multi-user.target - Multi-User System.
[  OK  ] Reached target graphical.target - Graphical Interface.
         Starting systemd-update-ut… Record Runlevel Change in UTMP...
[  OK  ] Finished systemd-update-ut… - Record Runlevel Change in UTMP.

Amazon Linux 2023
Kernel 5.15.0-92-generic on an x86_64 (-)

We are also wondering how the admin machine connects to the worker and control-plane and reset it to bottlerocket? Is it with the SSH (we already configured the private key on the admin server and the certificates on the worker and the control plane)

Another question: Is it okay that the worker and the control-plane machine are Ubuntu installed?

Anything that you know or suggest can help :)

Thank you!

terry-hasegawa commented 9 months ago

Hi @afikmirc Creating workload cluster step has timed out.

2024-02-06T16:23:35.853+0200 V0 Creating new workload cluster
2024-02-06T17:23:36.344+0200 V9 docker {"stderr": "error: timed out waiting for the condition on clusters/bm-anywhere-cluster\n"}

Does your Boots log have your expected MAC address? If no, you have following problem

I share my boots logs for your reference. My expected MAC address (ec:2a:72:46:2e:c4) is included in the boots log.

In the Creating workload cluster step, the Admin machine and Control plane are connected via PXE boot (DHCP, HTTP). And the existing OS is deleted and new OS will install.

According to your boots log, the procedure stops at 5. 0207boots.txt eks-a

afikmirc commented 9 months ago

Hi @terry-hasegawa , thank you for the response.

Does your Boots log have your expected MAC address? - No, the 2 MACS (worker and control-plane) in the hardware.csv file are missing. We do use layer 2 connectivity between the admin machine, worker, and control plane machines. (subnet: 192.168.55.0/24 connected to the same switch) We enabled PXE in the 3 machines. We don't have BMC.

PXE configuration: image

Should we reboot the machines manually (worker and control-plane) during the cluster creation? Edit: we tried to reboot the machines during the creation (so the PXE will take action) and it didn't log anything under "docker logs boots".

Thanks

terry-hasegawa commented 9 months ago

Hi @afikmirc , Does the workload machine run PXE boot? PXE boot procedure starts with the workload machine sending DHCP Discover. If it is not sending, then the problem is on the workload machine side. After that boots will receive DHCP Discover , record and reply.

afikmirc commented 9 months ago

Hi @terry-hasegawa , All the machines (including the worker) have the BIOS configuration that I posted in the previous comment (PXE boot enabled in the BIOS), this is all we configured for the PXE boot. Could you please mention what everything else is required to have the PXE boot? any suggestion can help. Thanks!

terry-hasegawa commented 9 months ago

Hi @afikmirc , I use Dell server. For Dell servers, to boot with PXE boot, enable PXE boot in NIC config, then enable PXE boot in Boot option. Do you have access to the console screen of the server? possibly you can check PXE booting status on the console screen.

afikmirc commented 9 months ago

Hi @terry-hasegawa, were using 3 Dell laptops for the deployment (Dell Latitude laptops) with Ubuntu 20.04 installed. Is there anything I can do to check the PXE from the OS inside/BIOS? Unfortunately, we do have only laptops for the lab and no physical servers are available for it, but for our understanding, every PC or laptop is supported in EKS Anywhere.

What do you think?

Thanks!

terry-hasegawa commented 9 months ago

Hi @afikmirc , We can see the PXE settings in the BIOS config and the status on the laptop screen.

I found the PXE boot view for XPS on youtube. https://www.youtube.com/watch?v=FrFlLTDHKsM&t=105s

I will check PXE boot using my Latitude 7520 with USB-Ether adaptor.

Did you check these articles? You need to change some BIOS configs and select PXE boot with the F12 key.

BIOS Settings to Allow PXE Boot on Dell Latitude Laptops https://www.dell.com/support/kbdoc/en-us/000131551/bios-settings-to-allow-pxe-boot-on-newer-model-dell-latitude-laptops

Cannot PXE Boot to a Dell Type-C to Ethernet Dongle https://www.dell.com/support/kbdoc/en-us/000148936/cannot-pxe-boot-to-a-dell-type-c-to-ethernet-dongle

Ethernet Connectivity lost during PXE booting using a USB Type-C to Ethernet adapter https://www.dell.com/support/kbdoc/en-us/000144650/ethernet-connectivity-lost-during-pxe-booting-using-a-usb-type-c-to-ethernet-adapter