Open NeilW opened 1 month ago
Hello! Thank you for raising this bug. I was hoping you could provide a little more context on this issue and what you would like to see as a fix.
Looking at the snippet of logs you provided, it seems like 3 different EC2 metadata formats are tried until the 4th attempt succeeds. So I assume a much older metadata scheme was finally returned by the IMDS on that 4th attempt, and then cloud-init issues a warning later due the lack of a network
key in the metadata. Is my understanding correct?
Did cloud-init fail to configure anything or have any regression in functionality after this issue was raised by cloud-init? Or is the issue you are looking for us to solve, solely that a warning is raised (when one shouldn't be), thus causing an undesired exit code 2?
Thanks in advance!
The 4th attempt is the default metadata version of 2009-04-04, which the code attempts to obtain a 'network' key from. And that doesn't exist as the list of top-level keys above shows.
Cloud-init appears to do everything it is supposed to do, but some change in cloud-init is now raising exit code 2 in jammy for warnings, when it didn't in bionic. That is failing userdata scripts that rely upon waiting for cloud-init to complete successfully before continuing.
Really though it shouldn't be throwing a warning on that network key with 2009-04-04 version of metadata.
@NeilW Thank you for the context!
That is failing userdata scripts that rely upon waiting for cloud-init to complete successfully before continuing.
Hi @NeilW, thanks for reporting this issue. I'm happy to help get this fixed, however I don't have access to brightbox. Are you willing and able to put together a fix for this? The current code appears to work correctly in EC2, otherwise we would see this failure in cloud-init's integration tests which check for warnings like this.
You’ll forgive me. I couldn’t find a 2009-04-04 version of the EC2 metadata in your test suite.
Could you point me to it?
Could you point me to it?
I don't think that we explicitly test a specific version of the EC2 metadata, but our integration test suite works more generally by launching an existing instance on a cloud, then cleaning the image (removing artifacts) and installing the latest version of cloud-init before booting it "clean". I would have to dig to understand which version is used in our tests, but I can tell you that we test EC2 daily, and a warning like this would have triggered a failing test in our verify_clean_boot()
or verify_clean_log()
utility functions which run on many of the EC2 tests.
@NeilW are you a brightbox developer?
That's what I understood from the code. The integration test only exercises IMDSv2 on EC2 using the 2021-03-23 version of the metadata layout. It doesn't check the other versions of the metadata, nor IMDSv1, and runs at a different time in the boot sequence (using DatasourceEC2Local
rather than DatasourceEC2
)
The Unit tests only cover
with the 'default metadata' in the tests referring to the 2016-09-02
version.
The question then is whether the code needs to match the tests and the min_metadata_version
should really by 2016-09-02
?
Brightbox is intending to update the metadata version it is issuing to the 2021-03-23
version, largely to avoid the time it would take to back port any fix to Ubuntu Noble.
I do some work for Brightbox when they ask me to, and I worked with Scott on the Brightbox bits of cloud-init back in the day.
The question then is whether the code needs to match the tests and the min_metadata_version should really by 2016-09-02?
If that is the oldest version that supports the network key, then probably yes.
Brightbox is intending to update the metadata version it is issuing to the 2021-03-23 version, largely to avoid the time it would take to back port any fix to Ubuntu Noble.
I'm guessing that this is not a new failure and was noticed due to the status 2 changes that landed in Noble?
2016-09-02 is the oldest version that cloud-init looks and tests for. The network key itself has been available since 2011-01-01 (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html).
This failure showed up in Noble with the status 2 changes. We initially thought it was the netplan failures (Hence #5374). It was only after that was fixed we realised that cloud-init had changed the status for all Warnings.
Bug report
DataSourceEC2 supports a minimal EC2 metadata version of 2009-04-04
https://github.com/canonical/cloud-init/blob/654cb4414b29ab845e0fdad97b5beca8721844df/cloudinit/sources/DataSourceEc2.py#L79
but issues a warning due to the lack of a
network
key. There is nonetwork
key on that version of metadata. (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html)This warning causes
cloud-init status
to exit with an exit code of 2, which fails many boot scripts.There is no test for a 2009-04-04 version of the metadata in the cloud-init data source test scripts.
Steps to reproduce the problem
The top-level keys for the metadata can be obtained from any EC2 machine
Environment details
cloud-init logs