Closed quanah closed 1 year ago
The generated user-data.txt file provided to cloud-init is:
H4sIAAAAAAAA/7xXf28TuRb9f6R+h/tMRNr34pmWx+6KQUEbaBYqWlo1oYAQGjmeO4mJxx5sT9LAsp995Zlkmx9tuqgVf7STxNfnHp977pXnhVYOlaP9WYEx5KV0omDGRbm4xPQpDHSpUmZmbXJydNJ9fvr2zWHn/AMJ/Dd6gcYKrWI4CPd3gp2A0uWgnWCBfShsoa1wVSxzjvFRjso9hUxIVCzHNnHZhBkb8mxIrvb1DVM2Q0O7iutUqGEMvw2EWwqoSDu8dBGXukwp1yoTw53gROS4Qe/BckwQTI1wmHgKNg4oFMyNYoh04SKmrBhIjPDSGVbx+my1CgD0VKGJAfkjWlo08eJDAFCgyYX1+WwMzf3H+/vNAIDXRGNofiMDxsdlkQxKPkZHYiJTVlClVWF0SutF0iKeXYKXLhlp67w2i8iadjjfEI4lM4qFCh1pkVTZ5KtWPnaxjuVBWO8rVmMXuElhMBOXC3gr2QRJi0g9HAo1vIHmfHUp7gpFD21EWsRgKmyCKi20UB7AYCEFZzWbajVUQxyVT0IspwchZ3yEIcvZV63Y1IZc56RFLMrMiqHCNOFoXMLVCtHrZVjfZJmyJP5I/hveqorUnMnbwj75FGYiOG5UZ5XVjWkW2ze2khZxWYJqUpeQepiS+44h35t3a625m+/QW5fUjlBKy40o3E3N9Z9oIFQ0YHYUBHhZaOPgrNN/1W74/3FUWhNVIvuwRcCr05NuOzJauyA46fY7h51+J+mfvu6+aTd2eWkk0J4F+h7O3vaBjJwr4ig6+PVJ+OiXx+H8GUnm0LqIFSJyeoyKAH0F5D1lU0t9g+boWMoco9UqdU5Si1yr1Mbw//19shd03vWS8+7Lo9OrtFsQYmisciUVy+3s0pliueCRUNYxxZGKFJUTbhalmpe+ZPAnfP4CocGh0Aqo2QuCB/ASHVywUjoomLVTbVLIjM6h1zsJ2NQCpfN40rg6BAFrcxiiowUzLEeHBij1ZgASecstBhydeGi6gCZA6VS4EU2Rm1nhKh5Ul64oHfgJCJR+KdHMoHm2AA4vmCyxeRXnDQPwDP6Kwgo98eiLenfe9I6eH3eTi87b435y1un13p2eHyZ/HB1326sb5mf31oDCiAlzCGOcQaYNCOXQKCahf9wDrvO8VIIzTzc4PuycJeenp/3kRed190O7sbui0pJIt2rkpKU+PR3jbKsy1YkXyqwJA4+eQZTiJFKllP9U1E8BNOBnlMg8dbSwKzJgatYCkYHSPsb/yYzWA60KtjAVUsIAYYgKDXM4t4MbIRRGT0SKaa0ZZxUCG+gJ7tWy9LrnF93zpH/cS150z/t30eZqblUyeW73JdE61TtWcY3pPdZyQ9POfavK7lPX4AF00hR8jsQ7JHHSJr6jdg1+KYXBdC8QGXwEmm27BMGnALzd/G0IgDO3NbgaaZQyM6yalzTW+pNAM4T/wbcNVjE0xjj73oRnELm8iMoi9W5Pli9jAPmkYnttwC03OZQWKwzkIw3Nb9+b90N2S9JMrBShHgIVhq8zRBs/exabv9a2gF1deYHJednmdNeanMAnePjw2uXqRDeu1m4my+X+t+5Y8scPOaQ61k3H2KjL+knmGOwmlPl5lkq4VoAYGv7Ruq4MdYU3l+p0fmf14Ta/3smxS569wbU/RcCfoeBWHTJRddIPz6rFnaeQbDbQegyUQ3UhBUqrfdRvbP++DZCK1Xz+LcACdZBiIfWsVT/qGb4aucgaznIZzAu5hdH95MnExpsDpTvB3wEAAP//XtY/9/MPAAA=
Note that github seems to have broken the lines oddly.
Note that after changing my terraform code for my Amazon EC2 launch instance to disable both base64 encoding AND gzip compression (just disabling gzip still failed) my instance correctly comes up.
@quanah is it possible to provide a distilled procedure for the terraform procedure that was used to deploy and reproduce this problem? It would aid us in providing better documentation for interacting with terraform provided user-data.
@blackboxsw I'll paste in the terraform code that generated it, but this is clearly a bug. cloud-init should be able to decode the base64 data w/o issue.
{
"module": "module.ldap_conf_service.module.scaling_service.module.base_service_asg",
"mode": "data",
"type": "template_cloudinit_config",
"name": "config_single",
"provider": "provider[\"registry.terraform.io/hashicorp/template\"]",
"instances": [
{
"index_key": 0,
"schema_version": 0,
"attributes": {
"base64_encode": true,
"gzip": true,
"id": "2935117093",
"part": [
{
"content": "#cloud-config\n\nwrite_files:\n- path: /opt/ansible/extravars.json\n owner: ec2-user:ec2-user\n permissions: '0400'\n content: '{\"dns_zone\":\"nonprod.eu1.ldap-cp.COMPANY.net\",\"hostname_prefix\":\"ldap-conf\",\"local_dns_name\":\"local.nonprod.eu1.ldap-cp.COMPANY.net\",\"master_repl_hosts\":[{\"host\":\"ldap-master-az1.nonprod.eu1.ldap-cp.COMPANY.net\",\"internal\":true,\"sid\":10},{\"host\":\"ldap-master-az2.nonprod.eu1.ldap-cp.COMPANY.net\",\"internal\":true,\"sid\":20},{\"host\":\"ldap-master-az3.nonprod.eu1.ldap-cp.COMPANY.net\",\"internal\":true,\"sid\":30}],\"selfsigned_cert_cn\":\"ldap-config.nonprod.COMPANY.net\",\"selfsigned_cert_sans\":[\"*.nonprod.eu1.ldap-cp.COMPANY.net\",\"local.nonprod.eu1.ldap-cp.COMPANY.net\"],\"service_hostname\":\"ldap-conf.nonprod.eu1.ldap-cp.COMPANY.net\",\"service_name\":\"ldap-conf\",\"tf_env\":\"non-production\"}'\n",
"content_type": "text/cloud-config",
"filename": "tfvars.cfg",
"merge_type": ""
},
{
"content": "#!/bin/bash\n\nexport PATH=$PATH:/usr/local/bin\nexport HOME=/root\n\nMETADATA_TOKEN=$(curl -Ss -X PUT \"http://169.254.169.254/latest/api/token\" -H \"X-aws-ec2-metadata-token-ttl-seconds: 300\")\nAWS_REGION=$(curl -H \"X-aws-ec2-metadata-token: $METADATA_TOKEN\" -Ss http://169.254.169.254/latest/dynamic/instance-identity/document | jq .region -r)\n\n# Get Vault password from SSM\naws --region \"$AWS_REGION\" ssm get-parameter --name \"/ldap/xxxxxxxx\" --with-decryption --output json --query 'Parameter.Value' --output text \u003e ~/.vault_pass\nexport ANSIBLE_VAULT_PASSWORD_FILE=xxxxxxxx\n\n# Get root private key for internal TLS communication\nLDAP_ROOT_CAKEY=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/tls-root-key\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\n\n# Get server certificates (if any, if not set self-signed certs will be generated from the provided root ca set above)\nLDAP_SERVER_TLS_CERT=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/ldap-conf/tls-cert\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\nLDAP_SERVER_TLS_KEY=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/ldap-conf/tls-key\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\nLDAP_SERVER_TLS_CACERT=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/ldap-conf/tls-cacert\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\n\n# Add ldap_root_tls_key (required)\nif [ -f /opt/ansible/extravars.json ]\n then\n cat /opt/ansible/extravars.json | jq --arg key \"$LDAP_ROOT_CAKEY\" '. + {ldap_root_tls_key: $key}' \u003e /tmp/updated_vars.json\n mv -f /tmp/updated_vars.json /opt/ansible/extravars.json\n else\n echo '{}' | jq --arg key \"$LDAP_ROOT_CAKEY\" '. + {ldap_root_tls_key: $key}' \u003e /opt/ansible/extravars.json\nfi\n\n# Add ldap_server_tls_cert / ldap_server_tls_key / ldap_server_tls_cacert (optional)\nif [ \"$LDAP_SERVER_TLS_CERT\" ] \u0026\u0026 [ \"$LDAP_SERVER_TLS_KEY\" ] \u0026\u0026 [ \"$LDAP_SERVER_TLS_CACERT\" ]\n then\n if [ -f /opt/ansible/extravars.json ]\n then\n cat /opt/ansible/extravars.json | jq --arg cert \"$LDAP_SERVER_TLS_CERT\" --arg key \"$LDAP_SERVER_TLS_KEY\" --arg cacert \"$LDAP_SERVER_TLS_CACERT\" '. + {ldap_server_tls_cert: $cert, ldap_server_tls_key: $key, ldap_server_tls_cacert: $cacert}' \u003e /tmp/updated_vars.json\n mv -f /tmp/updated_vars.json /opt/ansible/extravars.json\n else\n echo '{}' | jq --arg cert \"$LDAP_SERVER_TLS_CERT\" --arg key \"$LDAP_SERVER_TLS_KEY\" --arg cacert \"$LDAP_SERVER_TLS_CACERT\" '. + {ldap_server_tls_cert: $cert, ldap_server_tls_key: $key, ldap_server_tls_cacert: $cacert}' \u003e /opt/ansible/extravars.json\n fi\nfi\n\nif [ -f /opt/ansible/extravars.json ]\n then\n ansible-playbook -c local --extra-vars=@/opt/ansible/extravars.json -i /opt/ansible/hosts -t deploy,deploy-conf /opt/ansible/playbook.yml\nelse\n ansible-playbook -c local -i /opt/ansible/hosts -t deploy,deploy-conf /opt/ansible/playbook.yml\nfi\n",
"content_type": "text/x-shellscript",
"filename": "ansible.cfg",
"merge_type": ""
}
],
"rendered": "H4sIAAAAAAA......"
},
"sensitive_attributes": []
}
]
},
That's what terraform generated (I cut out most of the rendered bit, that's in https://github.com/canonical/cloud-init/issues/4239#issuecomment-1629663436
And this is the terraform code that rendered the above.
locals {
service_hostname = var.create_multiple ? {
for subnet_id, suffix in var.subnet_suffix_mapping :
subnet_id => "${var.ansible_vars["hostname_prefix"]}-${suffix}.${trimsuffix(var.ansible_vars["dns_zone"], ".")}"
} : {
"all" = "${var.ansible_vars["hostname_prefix"]}.${trimsuffix(var.ansible_vars["dns_zone"], ".")}"
}
tfvar_parts = var.create_multiple ? {
for subnet_id, suffix in var.subnet_suffix_mapping :
subnet_id => {
filename = "tfvars.cfg"
content_type = "text/cloud-config"
content = templatefile("${path.module}/files/write_file.tpl", {
path = "/opt/ansible/extravars.json"
data = merge(var.ansible_vars, { subnet_suffix = suffix, service_hostname = local.service_hostname[subnet_id] })
})
}
} : length(var.ansible_vars) > 0 ? {
"all" = {
filename = "tfvars.cfg"
content_type = "text/cloud-config"
content = templatefile("${path.module}/files/write_file.tpl", {
path = "/opt/ansible/extravars.json"
data = merge(var.ansible_vars, { service_hostname = local.service_hostname["all"] })
})
}
} : {}
ansible_part = {
filename = "ansible.cfg"
content_type = "text/x-shellscript"
content = templatefile("${path.module}/files/run_ansible.tpl", { deploy_tags = var.ansible_tags, service_name = var.ansible_vars["service_name"] })
}
}
data "template_cloudinit_config" "config_multiple" {
for_each = var.create_multiple ? local.tfvar_parts : {}
gzip = true
base64_encode = true
dynamic "part" {
for_each = concat([local.tfvar_parts[each.key]], [local.ansible_part])
content {
filename = part.value["filename"]
content_type = part.value["content_type"]
content = part.value["content"]
}
}
}
data "template_cloudinit_config" "config_single" {
count = var.create_multiple ? 0 : 1
gzip = true
base64_encode = true
dynamic "part" {
for_each = concat(values(local.tfvar_parts), [local.ansible_part])
content {
filename = part.value["filename"]
content_type = part.value["content_type"]
content = part.value["content"]
}
}
}
It appears the fix for https://github.com/canonical/cloud-init/issues/3712 was only applied to DataSourceHetzner
Correct. That fix was working around a quirk caused by Hetzner's expectation that users pass utf-8 strings, which broke gzipped userdata which is a supported user-data format. That fix arguably should have been made by the cloud provider, but since they refused to do so cloud-init provided a workaround.
@quanah To echo Scott's questions in the bug you referenced: Is there a reason you base64 encoded the content? Was there documentation to lead you to believe you should?
@holmanb I don't get the obsession with my data. The problem is not my data, the problem is that cloud-init is supposed to handle base64 encoded data, as documented, and it does not do so. This is what is known as a 'regression', as this worked correctly as described in the documentation in an older release and no longer works correctly. Even more, the error clearly indicates this is an issue with the data type when evaluating the userdata being treated incorrectly (the 'b' in the error). This is a known issue when going from python2 to python3.
As for documentation that cloud-init is supposed to handle base64 encoded data, there is: cloud-init's own documentation AWS's documentation Terraform's documentation
As for documentation that cloud-init is supposed to handle base64 encoded data, there is: cloud-init's own documentation
That link is not the cloud-init docs, it is the cloudbase-init docs (a cloud-init "clone") ;-)
Searching on the cloud-init docs site for "base64" only turns up the following references: (https://cloudinit.readthedocs.io/en/latest/search.html?q=base64&check_keywords=yes&area=default)
also that cloudinit-base doc on user-data refers to base64 only in the context of the write_files
module.
ok, that's interesting... so either Amazon is using cloudbase-init or they've hacked cloud-init to support this?
Ok, I found Amazon has a custom patch:
0005-Decode-userdata-if-it-is-base64-encoded.patch. I'll close this, sorry for the noise!
ok, that's interesting... so either Amazon is using cloudbase-init or they've hacked cloud-init to support this?
As I said the cloudbase-init docs also only mention base64 in the context of the write_files
module, not for encoding user-data documents. So there is nothing in their documents to indicate base64-encoded user-data is supported by them either.
Yeah, this is entirely an Amazon and/or Redhat hack
Ok, I found Amazon has a custom patch:
0005-Decode-userdata-if-it-is-base64-encoded.patch. I'll close this, sorry for the noise!
Can you provide a link to that patch? @quanah
Looking at the patch contents, Amazon special. Will follow up further with them. Again apologies for the noise :) Hopefully this at least would help anyone else who comes across this.
Ok, I found Amazon has a custom patch: 0005-Decode-userdata-if-it-is-base64-encoded.patch. I'll close this, sorry for the noise!
Can you provide a link to that patch?
I'm not sure where they post their source. I used dnf on my AmazonLinux2023 host to pull in the source RPM and then extracted the contents.
I'm not sure where they post their source. I used dnf on my AmazonLinux2023 host to pull in the source RPM and then extracted the contents.
Ok, got a link to the SRPM? I can unpack it myself also (on non-RPM system) to get at the patches.
When I tried to access on any system outside of amazon, you get errors because you can only talk to their dnf/yum repo from inside amazon. I don't know that it's publicly available anywhere. :/
If you're inside amazon, they list some of their repos here: https://docs.aws.amazon.com/linux/al2023/ug/managing-repos-os-updates.html
You might be able to get access to their docker image, instructions a bit down the page here: https://github.com/amazonlinux/amazon-linux-2023
It appears the fix for #3712 was only applied to DataSourceHetzner
However, this same error now occurs when using AmazonLinux 2023. The userdata passed in is valid, and fully decodes with:
cat userdata.txt | base64 -d | gunzip
Environment is using terraform to generate the user-data, which works fine on Amazon Linux 2 (which uses cloud-init version 0.19.3). Amazon Linux 2023 uses cloud-init version 0.22.2. I suspect the real issue however is the move from python2.7 in Amazon Linux 2 to python3.9 in Amazon Linux 2023.