canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 883 forks source link

Different code in Ubuntu package as in this repo #5753

Open f403 opened 1 month ago

f403 commented 1 month ago

Bug report

I've been trying to use network-config with NoCloud data source by upgrading cloudinit to 24.3.1 as mentioned in #5603, and was confused when it didn't work. The cloudinit version installed by Ubuntu package 24.3.1-0ubuntu0~22.04.1 differs from the release 24.3.1. Is this supposed to be so? And what code goes to the packages?

Steps to reproduce the problem

root@somehost:~# cloud-init --version
/usr/bin/cloud-init 24.3.1-0ubuntu0~22.04.1
root@somehost:~# wget -qO /tmp/DataSourceNoCloud.py https://raw.githubusercontent.com/canonical/cloud-init/refs/tags/24.3.1/cloudinit/sources/DataSourceNoCloud.py
root@somehost:~# diff /tmp/DataSourceNoCloud.py /usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py
193c193
<             md_seed, ud, vd, network = util.read_seeded(seedfrom, timeout=None)
---
>             md_seed, ud, vd, _ = util.read_seeded(seedfrom, timeout=None)
202d201
<             mydata["network-config"] = network

Environment details

cloud-init logs

N/A

aciba90 commented 1 month ago

Thanks, @f403, for making cloud-init better.

Yes, the downstream Ubuntu cloud-init packages add quilt patches on top of the raw cloud-init to retain stability and backwards compatibility for already released Ubuntu series.

More concretely, https://github.com/canonical/cloud-init/blob/ubuntu/jammy/debian/patches/no-nocloud-network.patch patches out the new feature included in https://github.com/canonical/cloud-init/commit/5322dca2f3da7bb9b8d6f1fac6ccb00ef33ef8ee. And the main reason to patch that out was its performance impact at boot time, I think.

More context on the SRU process on Ubuntu: https://wiki.ubuntu.com/StableReleaseUpdates and cloud-init: https://wiki.ubuntu.com/CloudinitUpdates.

f403 commented 1 month ago

Thanks, this is what I was looking for.

I wanted to argue, that the delay wouldn't be significant, but it looks like a big delay is indeed there: read_seeded sets retries=10 and further call to readurl (in case of http(s) protocol) will use its default sec_between=1 causing for 10 additional reties with 1 second sleep between retries.

This affects not only network-config but also other possibly missing files, e.g. vendor-data. I don't think that if the server replies with a 4XX error, trying 10 more times will change anything.

Would it be better to decrease retries to 1-2 when calling read_seeded, or to implement different handling of 4XX errors in the readurl helper? And what is the logic behind 10 retries?

aciba90 commented 1 month ago

Thanks for proposing ways to improve the performance impact of that feature. The team has reached the consensus of backporting the feature with limited or no retries as a quilt patch.