aws / eks-anywhere

Run Amazon EKS on your own infrastructure 🚀
https://anywhere.eks.amazonaws.com
Apache License 2.0
1.96k stars 285 forks source link

Bare Metal: write-netplan workflow failure when using Ubuntu #6636

Open Swalloow opened 1 year ago

Swalloow commented 1 year ago

What happened: EKS Anywhere's write-netplan action failed because DEST_DISK value is incorrect.

What you expected to happen: EKS Anywhere's write-netplan action should also succeed when using Ubuntu.

How to reproduce it (as minimally and precisely as possible): Follow EKS Anywhere bare metal cluster creation instruction with an Ubuntu 20.04 image.

Anything else we need to know?: It succeeds in EKS Anywhere v0.11.4 with the same settings. The states on success and failure are as follows:

# Success v0.11.4
Environment:         
  DEST_DISK:       /dev/xvda2
  DEST_PATH:       /etc/netplan/config.yaml
  DIRMODE:         0755
  FS_TYPE:         ext4
  GID:             0
  MODE:            0644
  STATIC_NETPLAN:  true
  UID:             0
Image:             public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-18
Name:              write-netplan
Pid:               host
Seconds:           4
Started At:        2023-09-01T06:22:27Z
Status:            STATE_SUCCESS
Timeout:           90

# Failed v0.17.0
Environment:
  DEST_DISK:       /dev/xvda
  DEST_PATH:       /etc/netplan/config.yaml
  DIRMODE:         0755
  FS_TYPE:         ext4
  GID:             0
  MODE:            0644
  STATIC_NETPLAN:  true
  UID:             0
image:             public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:404dab73a8a7f33e973c6e71782f07e82b125da9-eks-a-45
name:              write-netplan
pid:               host
seconds:           4
startedAt:         2023-09-01T05:42:47Z
status:            STATE_FAILED
timeout:           90

Environment:

chrisdoherty4 commented 1 year ago

Hi @Swalloow. It looks like you're missing the partition number in the DESK_DISK parameter on your v0.17.0 example, is that right?

Swalloow commented 1 year ago

Yes but in v0.11.4 to v0.14.6 it ran with the proper partition number without the template. In v0.15.0 to v0.17.0, do I need to use TinkerbellTemplateConfig to configure the partition number?

chrisdoherty4 commented 1 year ago

Understood. xvd isn't supported in upstream. I've made a patch (https://github.com/tinkerbell/tink/pull/786) to the upstream project and we will consider patching in EKS-A.

chrisdoherty4 commented 1 year ago

@d8660091 FYI.

chrisdoherty4 commented 12 months ago

Dependent on the upstream release, this may be fixed when we upgrade the stack in aws/eks-anywhere-internal#1952

ndeksa commented 11 months ago

Someone need to validate it on latest EKS-A BM release ( @ndeksa , @sp1999 )