canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
3.02k stars 888 forks source link

__init__.py[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata error #4239

Closed quanah closed 1 year ago

quanah commented 1 year ago

It appears the fix for #3712 was only applied to DataSourceHetzner

However, this same error now occurs when using AmazonLinux 2023. The userdata passed in is valid, and fully decodes with:

cat userdata.txt | base64 -d | gunzip

Environment is using terraform to generate the user-data, which works fine on Amazon Linux 2 (which uses cloud-init version 0.19.3). Amazon Linux 2023 uses cloud-init version 0.22.2. I suspect the real issue however is the move from python2.7 in Amazon Linux 2 to python3.9 in Amazon Linux 2023.

quanah commented 1 year ago

The generated user-data.txt file provided to cloud-init is: H4sIAAAAAAAA/7xXf28TuRb9f6R+h/tMRNr34pmWx+6KQUEbaBYqWlo1oYAQGjmeO4mJxx5sT9LAsp995Zlkmx9tuqgVf7STxNfnHp977pXnhVYOlaP9WYEx5KV0omDGRbm4xPQpDHSpUmZmbXJydNJ9fvr2zWHn/AMJ/Dd6gcYKrWI4CPd3gp2A0uWgnWCBfShsoa1wVSxzjvFRjso9hUxIVCzHNnHZhBkb8mxIrvb1DVM2Q0O7iutUqGEMvw2EWwqoSDu8dBGXukwp1yoTw53gROS4Qe/BckwQTI1wmHgKNg4oFMyNYoh04SKmrBhIjPDSGVbx+my1CgD0VKGJAfkjWlo08eJDAFCgyYX1+WwMzf3H+/vNAIDXRGNofiMDxsdlkQxKPkZHYiJTVlClVWF0SutF0iKeXYKXLhlp67w2i8iadjjfEI4lM4qFCh1pkVTZ5KtWPnaxjuVBWO8rVmMXuElhMBOXC3gr2QRJi0g9HAo1vIHmfHUp7gpFD21EWsRgKmyCKi20UB7AYCEFZzWbajVUQxyVT0IspwchZ3yEIcvZV63Y1IZc56RFLMrMiqHCNOFoXMLVCtHrZVjfZJmyJP5I/hveqorUnMnbwj75FGYiOG5UZ5XVjWkW2ze2khZxWYJqUpeQepiS+44h35t3a625m+/QW5fUjlBKy40o3E3N9Z9oIFQ0YHYUBHhZaOPgrNN/1W74/3FUWhNVIvuwRcCr05NuOzJauyA46fY7h51+J+mfvu6+aTd2eWkk0J4F+h7O3vaBjJwr4ig6+PVJ+OiXx+H8GUnm0LqIFSJyeoyKAH0F5D1lU0t9g+boWMoco9UqdU5Si1yr1Mbw//19shd03vWS8+7Lo9OrtFsQYmisciUVy+3s0pliueCRUNYxxZGKFJUTbhalmpe+ZPAnfP4CocGh0Aqo2QuCB/ASHVywUjoomLVTbVLIjM6h1zsJ2NQCpfN40rg6BAFrcxiiowUzLEeHBij1ZgASecstBhydeGi6gCZA6VS4EU2Rm1nhKh5Ul64oHfgJCJR+KdHMoHm2AA4vmCyxeRXnDQPwDP6Kwgo98eiLenfe9I6eH3eTi87b435y1un13p2eHyZ/HB1326sb5mf31oDCiAlzCGOcQaYNCOXQKCahf9wDrvO8VIIzTzc4PuycJeenp/3kRed190O7sbui0pJIt2rkpKU+PR3jbKsy1YkXyqwJA4+eQZTiJFKllP9U1E8BNOBnlMg8dbSwKzJgatYCkYHSPsb/yYzWA60KtjAVUsIAYYgKDXM4t4MbIRRGT0SKaa0ZZxUCG+gJ7tWy9LrnF93zpH/cS150z/t30eZqblUyeW73JdE61TtWcY3pPdZyQ9POfavK7lPX4AF00hR8jsQ7JHHSJr6jdg1+KYXBdC8QGXwEmm27BMGnALzd/G0IgDO3NbgaaZQyM6yalzTW+pNAM4T/wbcNVjE0xjj73oRnELm8iMoi9W5Pli9jAPmkYnttwC03OZQWKwzkIw3Nb9+b90N2S9JMrBShHgIVhq8zRBs/exabv9a2gF1deYHJednmdNeanMAnePjw2uXqRDeu1m4my+X+t+5Y8scPOaQ61k3H2KjL+knmGOwmlPl5lkq4VoAYGv7Ruq4MdYU3l+p0fmf14Ta/3smxS569wbU/RcCfoeBWHTJRddIPz6rFnaeQbDbQegyUQ3UhBUqrfdRvbP++DZCK1Xz+LcACdZBiIfWsVT/qGb4aucgaznIZzAu5hdH95MnExpsDpTvB3wEAAP//XtY/9/MPAAA=

quanah commented 1 year ago

Note that github seems to have broken the lines oddly.

quanah commented 1 year ago

Note that after changing my terraform code for my Amazon EC2 launch instance to disable both base64 encoding AND gzip compression (just disabling gzip still failed) my instance correctly comes up.

blackboxsw commented 1 year ago

@quanah is it possible to provide a distilled procedure for the terraform procedure that was used to deploy and reproduce this problem? It would aid us in providing better documentation for interacting with terraform provided user-data.

quanah commented 1 year ago

@blackboxsw I'll paste in the terraform code that generated it, but this is clearly a bug. cloud-init should be able to decode the base64 data w/o issue.

        {
          "module": "module.ldap_conf_service.module.scaling_service.module.base_service_asg",
          "mode": "data",
          "type": "template_cloudinit_config",
          "name": "config_single",
          "provider": "provider[\"registry.terraform.io/hashicorp/template\"]",
          "instances": [
            {
              "index_key": 0,
              "schema_version": 0,
              "attributes": {
                "base64_encode": true,
                "gzip": true,
                "id": "2935117093",
                "part": [
                  {
                    "content": "#cloud-config\n\nwrite_files:\n- path: /opt/ansible/extravars.json\n  owner: ec2-user:ec2-user\n  permissions: '0400'\n  content: '{\"dns_zone\":\"nonprod.eu1.ldap-cp.COMPANY.net\",\"hostname_prefix\":\"ldap-conf\",\"local_dns_name\":\"local.nonprod.eu1.ldap-cp.COMPANY.net\",\"master_repl_hosts\":[{\"host\":\"ldap-master-az1.nonprod.eu1.ldap-cp.COMPANY.net\",\"internal\":true,\"sid\":10},{\"host\":\"ldap-master-az2.nonprod.eu1.ldap-cp.COMPANY.net\",\"internal\":true,\"sid\":20},{\"host\":\"ldap-master-az3.nonprod.eu1.ldap-cp.COMPANY.net\",\"internal\":true,\"sid\":30}],\"selfsigned_cert_cn\":\"ldap-config.nonprod.COMPANY.net\",\"selfsigned_cert_sans\":[\"*.nonprod.eu1.ldap-cp.COMPANY.net\",\"local.nonprod.eu1.ldap-cp.COMPANY.net\"],\"service_hostname\":\"ldap-conf.nonprod.eu1.ldap-cp.COMPANY.net\",\"service_name\":\"ldap-conf\",\"tf_env\":\"non-production\"}'\n",
                    "content_type": "text/cloud-config",
                    "filename": "tfvars.cfg",
                    "merge_type": ""
                  },
                  {
                    "content": "#!/bin/bash\n\nexport PATH=$PATH:/usr/local/bin\nexport HOME=/root\n\nMETADATA_TOKEN=$(curl -Ss -X PUT \"http://169.254.169.254/latest/api/token\" -H \"X-aws-ec2-metadata-token-ttl-seconds: 300\")\nAWS_REGION=$(curl -H \"X-aws-ec2-metadata-token: $METADATA_TOKEN\" -Ss http://169.254.169.254/latest/dynamic/instance-identity/document | jq .region -r)\n\n# Get Vault password from SSM\naws --region \"$AWS_REGION\" ssm get-parameter --name \"/ldap/xxxxxxxx\" --with-decryption --output json --query 'Parameter.Value' --output text  \u003e ~/.vault_pass\nexport ANSIBLE_VAULT_PASSWORD_FILE=xxxxxxxx\n\n# Get root private key for internal TLS communication\nLDAP_ROOT_CAKEY=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/tls-root-key\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\n\n# Get server certificates (if any, if not set self-signed certs will be generated from the provided root ca set above)\nLDAP_SERVER_TLS_CERT=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/ldap-conf/tls-cert\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\nLDAP_SERVER_TLS_KEY=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/ldap-conf/tls-key\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\nLDAP_SERVER_TLS_CACERT=$(aws --region $AWS_REGION ssm get-parameter --name \"/ldap/ldap-conf/tls-cacert\" --with-decryption --output text --query Parameter.Value 2\u003e /dev/null)\n\n# Add ldap_root_tls_key (required)\nif [ -f /opt/ansible/extravars.json ]\n  then\n    cat /opt/ansible/extravars.json | jq --arg key \"$LDAP_ROOT_CAKEY\" '. + {ldap_root_tls_key: $key}' \u003e /tmp/updated_vars.json\n    mv -f /tmp/updated_vars.json /opt/ansible/extravars.json\n  else\n    echo '{}' | jq --arg key \"$LDAP_ROOT_CAKEY\" '. + {ldap_root_tls_key: $key}' \u003e /opt/ansible/extravars.json\nfi\n\n# Add ldap_server_tls_cert / ldap_server_tls_key / ldap_server_tls_cacert (optional)\nif [ \"$LDAP_SERVER_TLS_CERT\" ] \u0026\u0026 [ \"$LDAP_SERVER_TLS_KEY\" ] \u0026\u0026 [ \"$LDAP_SERVER_TLS_CACERT\" ]\n  then\n  if [ -f /opt/ansible/extravars.json ]\n    then\n      cat /opt/ansible/extravars.json | jq --arg cert \"$LDAP_SERVER_TLS_CERT\" --arg key \"$LDAP_SERVER_TLS_KEY\" --arg cacert \"$LDAP_SERVER_TLS_CACERT\" '. + {ldap_server_tls_cert: $cert, ldap_server_tls_key: $key, ldap_server_tls_cacert: $cacert}' \u003e /tmp/updated_vars.json\n      mv -f /tmp/updated_vars.json /opt/ansible/extravars.json\n    else\n      echo '{}' | jq --arg cert \"$LDAP_SERVER_TLS_CERT\" --arg key \"$LDAP_SERVER_TLS_KEY\" --arg cacert \"$LDAP_SERVER_TLS_CACERT\"  '. + {ldap_server_tls_cert: $cert, ldap_server_tls_key: $key, ldap_server_tls_cacert: $cacert}' \u003e /opt/ansible/extravars.json\n  fi\nfi\n\nif [ -f /opt/ansible/extravars.json ]\n  then\n    ansible-playbook -c local --extra-vars=@/opt/ansible/extravars.json -i /opt/ansible/hosts -t deploy,deploy-conf /opt/ansible/playbook.yml\nelse\n  ansible-playbook -c local -i /opt/ansible/hosts -t deploy,deploy-conf /opt/ansible/playbook.yml\nfi\n",
                    "content_type": "text/x-shellscript",
                    "filename": "ansible.cfg",
                    "merge_type": ""
                  }
                ],
                "rendered": "H4sIAAAAAAA......"
              },
              "sensitive_attributes": []
            }
          ]
        },
quanah commented 1 year ago

That's what terraform generated (I cut out most of the rendered bit, that's in https://github.com/canonical/cloud-init/issues/4239#issuecomment-1629663436

quanah commented 1 year ago

And this is the terraform code that rendered the above.

    locals {
      service_hostname = var.create_multiple ? {
        for subnet_id, suffix in var.subnet_suffix_mapping :
        subnet_id => "${var.ansible_vars["hostname_prefix"]}-${suffix}.${trimsuffix(var.ansible_vars["dns_zone"], ".")}"
        } : {
        "all" = "${var.ansible_vars["hostname_prefix"]}.${trimsuffix(var.ansible_vars["dns_zone"], ".")}"
      }
      tfvar_parts = var.create_multiple ? {
        for subnet_id, suffix in var.subnet_suffix_mapping :
        subnet_id => {
          filename     = "tfvars.cfg"
          content_type = "text/cloud-config"
          content = templatefile("${path.module}/files/write_file.tpl", {
            path = "/opt/ansible/extravars.json"
            data = merge(var.ansible_vars, { subnet_suffix = suffix, service_hostname = local.service_hostname[subnet_id] })
          })
        }
        } : length(var.ansible_vars) > 0 ? {
        "all" = {
          filename     = "tfvars.cfg"
          content_type = "text/cloud-config"
          content = templatefile("${path.module}/files/write_file.tpl", {
            path = "/opt/ansible/extravars.json"
            data = merge(var.ansible_vars, { service_hostname = local.service_hostname["all"] })
          })
        }
      } : {}
    ​
      ansible_part = {
        filename     = "ansible.cfg"
        content_type = "text/x-shellscript"
        content      = templatefile("${path.module}/files/run_ansible.tpl", { deploy_tags = var.ansible_tags, service_name = var.ansible_vars["service_name"] })
      }
    }
    ​
    data "template_cloudinit_config" "config_multiple" {
      for_each = var.create_multiple ? local.tfvar_parts : {}
    ​
      gzip          = true
      base64_encode = true
    ​
      dynamic "part" {
        for_each = concat([local.tfvar_parts[each.key]], [local.ansible_part])
        content {
          filename     = part.value["filename"]
          content_type = part.value["content_type"]
          content      = part.value["content"]
        }
      }
    }
    ​
    data "template_cloudinit_config" "config_single" {
      count = var.create_multiple ? 0 : 1
    ​
      gzip          = true
      base64_encode = true
    ​
      dynamic "part" {
        for_each = concat(values(local.tfvar_parts), [local.ansible_part])
        content {
          filename     = part.value["filename"]
          content_type = part.value["content_type"]
          content      = part.value["content"]
        }
      }
    }
holmanb commented 1 year ago

It appears the fix for https://github.com/canonical/cloud-init/issues/3712 was only applied to DataSourceHetzner

Correct. That fix was working around a quirk caused by Hetzner's expectation that users pass utf-8 strings, which broke gzipped userdata which is a supported user-data format. That fix arguably should have been made by the cloud provider, but since they refused to do so cloud-init provided a workaround.

holmanb commented 1 year ago

@quanah To echo Scott's questions in the bug you referenced: Is there a reason you base64 encoded the content? Was there documentation to lead you to believe you should?

quanah commented 1 year ago

@holmanb I don't get the obsession with my data. The problem is not my data, the problem is that cloud-init is supposed to handle base64 encoded data, as documented, and it does not do so. This is what is known as a 'regression', as this worked correctly as described in the documentation in an older release and no longer works correctly. Even more, the error clearly indicates this is an issue with the data type when evaluating the userdata being treated incorrectly (the 'b' in the error). This is a known issue when going from python2 to python3.

As for documentation that cloud-init is supposed to handle base64 encoded data, there is: cloud-init's own documentation AWS's documentation Terraform's documentation

dermotbradley commented 1 year ago

As for documentation that cloud-init is supposed to handle base64 encoded data, there is: cloud-init's own documentation

That link is not the cloud-init docs, it is the cloudbase-init docs (a cloud-init "clone") ;-)

Searching on the cloud-init docs site for "base64" only turns up the following references: (https://cloudinit.readthedocs.io/en/latest/search.html?q=base64&check_keywords=yes&area=default)

also that cloudinit-base doc on user-data refers to base64 only in the context of the write_files module.

quanah commented 1 year ago

ok, that's interesting... so either Amazon is using cloudbase-init or they've hacked cloud-init to support this?

quanah commented 1 year ago

Ok, I found Amazon has a custom patch:

0005-Decode-userdata-if-it-is-base64-encoded.patch. I'll close this, sorry for the noise!

dermotbradley commented 1 year ago

ok, that's interesting... so either Amazon is using cloudbase-init or they've hacked cloud-init to support this?

As I said the cloudbase-init docs also only mention base64 in the context of the write_files module, not for encoding user-data documents. So there is nothing in their documents to indicate base64-encoded user-data is supported by them either.

quanah commented 1 year ago

Yeah, this is entirely an Amazon and/or Redhat hack

dermotbradley commented 1 year ago

Ok, I found Amazon has a custom patch:

0005-Decode-userdata-if-it-is-base64-encoded.patch. I'll close this, sorry for the noise!

Can you provide a link to that patch? @quanah

quanah commented 1 year ago

Looking at the patch contents, Amazon special. Will follow up further with them. Again apologies for the noise :) Hopefully this at least would help anyone else who comes across this.

quanah commented 1 year ago

Ok, I found Amazon has a custom patch: 0005-Decode-userdata-if-it-is-base64-encoded.patch. I'll close this, sorry for the noise!

Can you provide a link to that patch?

I'm not sure where they post their source. I used dnf on my AmazonLinux2023 host to pull in the source RPM and then extracted the contents.

dermotbradley commented 1 year ago

I'm not sure where they post their source. I used dnf on my AmazonLinux2023 host to pull in the source RPM and then extracted the contents.

Ok, got a link to the SRPM? I can unpack it myself also (on non-RPM system) to get at the patches.

quanah commented 1 year ago

When I tried to access on any system outside of amazon, you get errors because you can only talk to their dnf/yum repo from inside amazon. I don't know that it's publicly available anywhere. :/

If you're inside amazon, they list some of their repos here: https://docs.aws.amazon.com/linux/al2023/ug/managing-repos-os-updates.html

quanah commented 1 year ago

You might be able to get access to their docker image, instructions a bit down the page here: https://github.com/amazonlinux/amazon-linux-2023