amazonlinux / amazon-linux-2023

Amazon Linux 2023
https://aws.amazon.com/linux/amazon-linux-2023/
Other
532 stars 40 forks source link

[Bug] - Custom cloud init hack for userdata is broken for AL2023 #401

Closed quanah closed 1 year ago

quanah commented 1 year ago

Describe the bug Amazon applies a patch to cloud init (0005-Decode-userdata-if-it-is-base64-encoded.patch) to the cloud-init software stack. However it does not work correctly on AmazonLinux 2023 leading to instance failure if the userdata is base64 encoded as described in the Amazon documentation.

To Reproduce Steps to reproduce the behavior:

Use base64 encoded userdata as describe in the above documentation. Optionally gzip it (but not necessary for reproduction)

Expected behavior userdata is base64 decoded and is usable.

Instead you get the error:

__init__.py[WARNING]: Uhandled non-multipart (text/x-not-multipart) userdata: 'b'H4sIAAAA...

The 'b' here indicates an issue with python incorrectly treating this data as bytes instead of a string and seems to be the source of the problem.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context

I've also reported this as Case 13251716281 with Amazon

nmeyerhans commented 1 year ago

Confirmed. We're working on a fix.

Note, though, that in many common cases you actually should not be base64 encoding your userdata. The encoding should only happen to the input to the RunInstances API. If you're using the AWS CLI, the userdata encoding is performed automatically for you, and if you are base64 encoding the data yourself, then it's effectively doubly encoded. Similarly, if you're launching an instance using the web UI, then it assumes that the given data is not base64 encoded and performs the encoding for you, though there's a toggle to bypass this feature if you're already providing base64 encoded input. The double encoding is the scenario that the patch implicated in this issue is intended to address. If you're running into issues caused by this patch, my recommendation would be to avoid the redundant base64 encoding.

quanah commented 1 year ago

We use terraform to generate the base64 encoding, as well as to gzip the data. We are not using the AWS CLI or web UI at all.

quanah commented 1 year ago

Terraform documentation here

nmeyerhans commented 1 year ago

I'm not sure the Amazon Linux patch is specifically related to your issue here. Terraform works with userdata encoding without any Amazon Linux specific customization to cloud-init and is used widely among multiple different Linux distros on EC2. I'll see if I can repro your issue, though.

quanah commented 1 year ago

Yes, it works fine with Amazon Linux 2. it broke immediately with AL2023. The data was clearly correctly base64 encoded. Please read over https://github.com/canonical/cloud-init/issues/4239 for more detail.

nmeyerhans commented 1 year ago

I'm not able to reproduce the issue you're describing using terraform. I've tested both AL2023 (which has the patch you suspect as being the issue) and Debian 12, which does not have that patch. The results are the same in both cases, that cloud-init is able to correctly process the given userdata. The terraform code I've used when trying to repro this is based on the template_cloudinit_config example from the terraform docs. The relevant portion is below.

I'm guessing there's more to it, so if you could provide a modified variant of the below terraform config that triggers this issue, I can debug further.

data "template_cloudinit_config" "config" {
  gzip          = true
  base64_encode = true

  # Main cloud-config configuration file.
  part {
    filename     = "init.cfg"
    content_type = "text/cloud-config"
    content      = data.template_file.script.rendered
  }

  part {
    content_type = "text/x-shellscript"
    content      = data.template_file.shell1.rendered
  }

  part {
    content_type = "text/x-shellscript"
    content      = data.template_file.shell2.rendered
  }
}

resource "aws_instance" "tf-test-server" {
  subnet_id       = data.aws_subnet.subnet.id
  security_groups = [aws_security_group.test.id]

  # AL2023
  #ami                                  = "ami-00b8975bc3de669d2"
  # Debian 12
  ami                                  = "ami-0cf467cdd983a5859"
  instance_initiated_shutdown_behavior = "terminate"
  instance_type                        = "t4g.medium"
  key_name                             = data.aws_key_pair.key_pair.key_name

  user_data_base64 = data.template_cloudinit_config.config.rendered
}
quanah commented 1 year ago

Hi, Our terraform code has been unchanged since 2019. There has never been an issue with it with Amazon Linux 2. The issue immediately occurred when moving to AL2023 (thankfully only in our nonprod environment). The terraform code that generates the correctly base64 encoded + gzip userdata is already in the bug report I linked to above.

I will note that the userdata file that is stored on disk at /var/lib/cloud/instance/user-data.txt is correct, as fetched from the AWS launch template. Its contents are identical for both AL2 and AL2023. It is not double encoded, and base 64 -d user-data.txt | gunzip extracts the correct information. It is only the cloudinit process in AL2023 that fails to recognize the content as valid.

Given the message about 'b', I would generally guess it's a python issue related to going from python2 (AL2) to python3 (AL2023) in some way (whether it's the patch doesn't account for it correctly or another issue).

Is there any way for me to obtain your updated version of the cloudinit package to test your current fix on my AL2023 instance?

nmeyerhans commented 1 year ago

Is there any way for me to obtain your updated version of the cloudinit package to test your current fix on my AL2023 instance?

Yes, I would love more testing of the proposed change. If you can send me an AWS account number via email (nmeyerha at amazon.com), I can make it available in an S3 location.

quanah commented 1 year ago

@nmeyerhans Thanks for sending me the RPM. I can confirm things work again with this version!

quanah commented 1 year ago

Ok, both found an instance that uses the alternate config. I am able to confirm it works both with and without base64 encoding, and with it gzip'd when base64 encoded. 👍

nmeyerhans commented 1 year ago

Thanks for the confirmation. I'll work on getting this merged upstream so we can drop the local patch. Will resolve this issue when the fix is available in the repos.

nmeyerhans commented 1 year ago

cloud-init PR is https://github.com/canonical/cloud-init/pull/4276

nmeyerhans commented 1 year ago

The fix for this will be included in the next scheduled AL2023 release. ETA is approximately Aug. 22.

rsmaan4u8 commented 1 year ago

Is there any new ETA for release with this fix?

stewartsmith commented 1 year ago

The cloud-init update where this came in was:

* Fri Jul 14 2023 Noah Meyerhans <nmeyerha@amazon.com> - 22.2.2-1.amzn2023.1.9
- Drop 0005-Decode-userdata-if-it-is-base64-encoded.patch
- Fix handling of doubly base64 encoded userdata

It was first released as part of the 2023.1.20230823 release, see the 2023.1.20230823 release notes for details.