Open ubuntu-server-builder opened 1 year ago
We met something similar and what I think is happening is OVN :-)
In OpenStack Neutron with ML2/OVS backend there's this whole mechanism of 'provisioning blocks', so by the time VM is started you have guarantee that DHCP for the instance is ready - and this is why it worked for so many years w/o retries here. With OVN instead of OVS, afaiu there's no such guarantee any more, so technically VM can be started before the DHCP is ready. Thus we should add max_wait for the OpenStack data source as well, e.g. similar to https://github.com/canonical/cloud-init/blob/be7f64d7615cc3f2f14b6e998a577536464a2524/cloudinit/sources/DataSourceCloudStack.py#L81
This bug was originally filed in Launchpad as LP: #1979049
Launchpad details
Launchpad user David Caro(dcaro) wrote on 2022-06-17T09:59:38.781185+00:00
We are having some instability in our Openstack Wallaby systems right now, and have found that even though passing the "retries" option cloud-init openstack datasource, the code might fail also on a first attempt to detect alive metadata services as it's not retrying at that point (only when trying to fetch the data itself).
Some logs of the error:
We think that it should retry to fetch also on this first pass (we are looking also on the instability sources, but this would help make it more resilient).