canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.88k stars 857 forks source link

Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack #3358

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1821102

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = 2019-07-17T17:12:01.510967+00:00
date_created = 2019-03-20T23:02:10.853054+00:00
date_fix_committed = 2019-07-16T22:40:19.826047+00:00
date_fix_released = 2019-07-17T17:12:01.510967+00:00
id = 1821102
importance = high
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1821102
milestone = None
owner = andybotting
owner_name = Andy Botting
private = False
status = fix_released
submitter = andybotting
submitter_name = Andy Botting
tags = []
duplicates = []

Launchpad user Andy Botting(andybotting) wrote on 2019-03-20T23:02:10.853054+00:00

As fixed in bug #1749717, cloud-init will attempt to configure an ephemeral ipv4 address on the first interface to fetch OpenStack (and probably others) networking config via a metadata URL.

There's a couple of issues with this implementation that affect our OpenStack cloud.

Access to our metadata server on 169.254.169.254 is delivered by an additional route delivered by DHCP, which is not configured via cloud-init's dhcp.py (that is probably another bug).

Also, we needed to bump up the timeouts for accessing our metadata, as we're a largeish cloud and the defaults were way too low. We actually copied the timeout/retry values from the Ec2 Datasource.

So the result is that users are left waiting for cloud-init-local stage to timeout, as the additional route to the metadata server isn't configured, which was 2 mins in our config.

I believe a simple fix for this situation would be to skip the ephemeral ipv4 setup if the datastore config has apply_network_config: False

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-03-21T14:11:10.507339+00:00

Thank you for filing a bug.

Would you be able to provide the output from 'cloud-init collect-logs' and attach the tarball?

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-03-21T20:31:26.900300+00:00

Logs attached.

Thanks Launchpad attachments: cloud-init.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-03-21T20:31:31.416668+00:00

Logs attached.

Thanks

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-04-04T14:57:01.569952+00:00

Thanks for the logs.

The dhcp response currently results in the following setup:

Received dhcp lease on eth0 for 130.56.248.107/255.255.240.0
Attempting setup of ephemeral network on eth0 with 130.56.248.107/20 brd 130.56.255.255 Running command ['ip', '-family', 'inet', 'addr', 'add', '130.56.248.107/20', 'broadcast', '130.56.255.255', 'dev', 'eth0'] Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up']
Running command ['ip', 'route', 'show', '0.0.0.0/0']
Running command ['ip', '-4', 'route', 'add', '130.56.240.1', 'dev', 'eth0', 'src', '130.56.248.107'] Running command ['ip', '-4', 'route', 'add', 'default', 'via', '130.56.240.1', 'dev', 'eth0']

Note, cloud-init is running dhclient; what additional route is not being applied?

If the additional route is not provided, why would the second datasource crawl in init-net stage succeed?

Wouldn't increasing the timeouts for the initial crawl (or fixing the missing static route) in init-local suffice ?

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-04-04T20:21:05+00:00

It never works in the first stage because the static route to 169.254.169.254 isn't set up. This is delivered by dhcp, which isn't specifically handled by cloud-init. I verified this by looking at the output of dhclient.

It works later because the interface is brought up properly by the system between the first and second stages. We use systemd networkd here with a configuration to simply setup ipv4 by dhcp on all eth* interfaces, which does correctly apply the route.

I can provide more debugging around the interfaces/routes if you like.

On Fri., 5 Apr. 2019, 2:10 am Ryan Harper, 1821102@bugs.launchpad.net wrote:

Thanks for the logs.

The dhcp response currently results in the following setup:

Received dhcp lease on eth0 for 130.56.248.107/255.255.240.0

Attempting setup of ephemeral network on eth0 with 130.56.248.107/20 brd 130.56.255.255 Running command ['ip', '-family', 'inet', 'addr', 'add', ' 130.56.248.107/20', 'broadcast', '130.56.255.255', 'dev', 'eth0'] Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] Running command ['ip', 'route', 'show', '0.0.0.0/0']

Running command ['ip', '-4', 'route', 'add', '130.56.240.1', 'dev', 'eth0', 'src', '130.56.248.107'] Running command ['ip', '-4', 'route', 'add', 'default', 'via', '130.56.240.1', 'dev', 'eth0']

Note, cloud-init is running dhclient; what additional route is not being applied?

If the additional route is not provided, why would the second datasource crawl in init-net stage succeed?

Wouldn't increasing the timeouts for the initial crawl (or fixing the missing static route) in init-local suffice ?

** Changed in: cloud-init Status: New => Incomplete

-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1821102

Title: Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack

To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1821102/+subscriptions

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-04-04T20:47:31+00:00

On Thu, Apr 4, 2019 at 3:36 PM Andy Botting andy@andybotting.com wrote:

It never works in the first stage because the static route to 169.254.169.254 isn't set up. This is delivered by dhcp, which isn't specifically handled by cloud-init. I verified this by looking at the output of dhclient.

Can you provide the full DHCP response/lease file? We do run dhclient with -sf /bin/true to avoid modifying the root filesystem before we've parsed the network configuration.

However, if there's an optional route present in the response, then maybe cloud-init should extract that an apply the route (which is what /sbin/dhclient-script is doing).

It works later because the interface is brought up properly by the system

between the first and second stages. We use systemd networkd here with a configuration to simply setup ipv4 by dhcp on all eth* interfaces, which does correctly apply the route.

Is this baked in networkd configuration a workaround or do you have a custom image?

I can provide more debugging around the interfaces/routes if you like.

Thanks; additional info on the contents of the DHCP server response, and ip route show output would be helpful here.

On Fri., 5 Apr. 2019, 2:10 am Ryan Harper, 1821102@bugs.launchpad.net wrote:

Thanks for the logs.

The dhcp response currently results in the following setup:

Received dhcp lease on eth0 for 130.56.248.107/255.255.240.0

Attempting setup of ephemeral network on eth0 with 130.56.248.107/20 brd 130.56.255.255 Running command ['ip', '-family', 'inet', 'addr', 'add', ' 130.56.248.107/20', 'broadcast', '130.56.255.255', 'dev', 'eth0'] Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] Running command ['ip', 'route', 'show', '0.0.0.0/0']

Running command ['ip', '-4', 'route', 'add', '130.56.240.1', 'dev', 'eth0', 'src', '130.56.248.107'] Running command ['ip', '-4', 'route', 'add', 'default', 'via', '130.56.240.1', 'dev', 'eth0']

Note, cloud-init is running dhclient; what additional route is not being applied?

If the additional route is not provided, why would the second datasource crawl in init-net stage succeed?

Wouldn't increasing the timeouts for the initial crawl (or fixing the missing static route) in init-local suffice ?

** Changed in: cloud-init Status: New => Incomplete

-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1821102

Title: Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack

To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1821102/+subscriptions

-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1821102

Title: Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack

To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1821102/+subscriptions

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-04-04T21:50:40.803036+00:00

Hi Ryan,

Can you provide the full DHCP response/lease file? We do run dhclient with -sf /bin/true to avoid modifying the root filesystem before we've parsed the network configuration.

Absolutely, here it is.

From systemd:

cat /run/systemd/netif/leases/2

This is private data. Do not parse.

ADDRESS=130.56.248.145 NETMASK=255.255.240.0 ROUTER=130.56.240.1 SERVER_ADDRESS=130.56.248.255 NEXT_SERVER=130.56.248.255 BROADCAST=130.56.255.255 MTU=9000 T1=907200 T2=1587600 LIFETIME=1814400 DNS=150.203.1.10 8.8.8.8 DOMAINNAME=openstacklocal HOSTNAME=host-130-56-248-145 ROUTES=169.254.169.254/32,130.56.248.255 0.0.0.0/0,130.56.240.1 CLIENTID=ffb55e67ff00020000ab11982e99e5ba937fb2

Simulating cloud-init with ./dhclient -1 -v -lf dhcp.leases -pf dhclient.pid eth0 -sf /bin/true: lease { interface "eth0"; fixed-address 130.56.248.145; option subnet-mask 255.255.240.0; option routers 130.56.240.1; option dhcp-lease-time 1814400; option dhcp-message-type 5; option domain-name-servers 150.203.1.10,8.8.8.8; option dhcp-server-identifier 130.56.248.255; option interface-mtu 9000; option dhcp-renewal-time 907200; option rfc3442-classless-static-routes 32,169,254,169,254,130,56,248,255,0,130,56,240,1; option broadcast-address 130.56.255.255; option dhcp-rebinding-time 1587600; option host-name "host-130-56-248-145"; option domain-name "openstacklocal"; renew 6 2019/04/13 01:05:11; rebind 2 2019/04/23 06:38:57; expire 4 2019/04/25 21:38:57; }

However, if there's an optional route present in the response, then maybe cloud-init should extract that an apply the route (which is what /sbin/dhclient-script is doing).

Yeah, I thought about that too. For us, we don't need that functionality, but I guess others might. I did look at potentially doing that, but I hadn't worked out the format of the rfc3442-classless-static-routes yet.

Is this baked in networkd configuration a workaround or do you have a custom image?

We do build our own images, with optimisations for running on our cloud. We try and make it as light-touch as possible, but in terms of networking, we have been setting the newer distros to use systemd-networkd instead. It allows us do dhcp on all eth* interfaces, so if users attach multiple networks, they get configured right away by the OS.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2019-06-04T04:17:31.629674+00:00

[Expired for cloud-init because there has been no activity for 60 days.]

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-04T11:19:17.763833+00:00

I've reopened this, and marked it as confirmed. Please let me know if there's any more information I can provide.

I'm happy to look into fixing this if I know what direction we should take on this, whether we should be implementing the routes, or allow a way of ignoring metadata in the first stage?

Thanks

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-04T14:14:46.671497+00:00

Thanks for re-opening; after your response to my request for information, you can move the bug state back to New. Sorry for the trouble and thanks for following up.

I do think that our EphemeralDHCP handler will need to detect and handle the static routes options; this should ensure that we can crawl metadata service early and get the full config.

Next, your request to run dhcp on all interfaces can be done via network-config from your metadata service, and cloud-init can read that openstack network-config and render that config. In Ubuntu Bionic and newer, cloud-init will render netplan config which then handles configuring systemd-networkd, and on Xenial, cloud-init renders /etc/network/interfaces.

What does your network-config from metadata service look like (standard openstack network_data.json) ?

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-04T23:42:05.424833+00:00

Here's an example of network_data.json from our flat networking setup. This provides a public routable IP address to the instance. Launchpad attachments: flat network

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-04T23:43:32.877329+00:00

This is an example of network_data.json for a private network. Launchpad attachments: private network

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-05T00:09:29.977851+00:00

Thanks.

The "flat" network looks to be just "dhcp" on the interface with specified MAC. Note, that since this specifies DHCP, it's not clear whether the DHCP response would include DNS settings that match what's also defined in the json;

And the "private" is also straightforward.

How do either configs interact with your "systemd-networkd dhcp on everything" changes?

We do have some in-progress work on handling hotplug interfaces in OpenStack[1] and updating the system config via updated network_data.json. However, in the DHCP on everything case, this does present multiple default routes which by default can take out networking on the instance without some extra work on setting up routing policies. Is that something you've baked into the image with networkd?

I'm interested in seeing those changes if you've got that working as we'd like to have cloud-init emit configs that work with dhcp on multiple interfaces.

And lastly; I think if we fix the ephemeral dhcp to add the static to the metadata server, then I don't think you'll need to use apply_network_config ds_config, IIUC. Could you confirm?

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-05T01:38:42.250197+00:00

Hi Ryan.

So in both cases, DHCP delivers all the information the instances need, so we technically don't require any of that information from network_data.json. Then systemd-networkd just works.

I realise I forget to mention that in our cloud-init config, to make this work we do set:

network: config: disabled

You are correct that two interfaces attached to an instance can break the routing. The usual case for us is that some instances will be attached to a 'data' network which has no default route, so this is fine.

In cases where a user would attach two interfaces from either the flat or a NAT'ted private network, then yes the routing doesn't work. We call this 'Advanced Networking' on our cloud, and it's mostly users who understand what they're doing so it's not been a problem.

I have considered possibly running eth0 with a lower metric than the other interfaces so they can at least get in via that interface if they get in trouble, but it's not been a problem for us.

If cloud-init did handle adding our static metadata route in the ephemeral dhcp, that would certainly fix our issue.

Thanks!

ubuntu-server-builder commented 1 year ago

Launchpad user Ben Raymond(benray12) wrote on 2019-06-06T18:36:30.129283+00:00

Andy, thanks for filing this. I believe I am running into it as well.

Have you been able to identify any workarounds at boot time?

thanks! Ben

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-06T21:51:34.201835+00:00

Hi Ben,

Unfortunately not. We have been just waiting out the timeout.

Ryan,

After the reading the docs again, I see this:

Local stage none: network configuration can be disabled entirely with config like the following in /etc/cloud/cloud.cfg: ‘network: {config: disabled}’

Do you think the correct behavior in this stage would be to disable the ephemeral IPv4?

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-06T22:10:32.820189+00:00

Andy,

I discussed this particular bug with the team on Wednesday. The 'network: {config: disabled}' is designed to tell cloud-init to not create network-configuration; which is typically fallback (dhcp on an interface) or read the metadata from a platform for a richer config. It is not, "don't read cloud metadata over the network".

The EphemeralDHCP setup allows us to fetch more than just network-config, in fact it reads all of the Datasource metadata, including instance-id and one critical part related to setting hostname; which needs to be set (in some cases) prior to bringing networking up (even if cloud-init isn't generating the config).

The 'apply_network_config: false' config was also meant to only disable the rendering of the network-config to remain backwards compatible with how cloud-init (On OpenStack) behaved in Xenial.

So while either of these configs seem to imply that cloud-init could skip running the EphemeralDHCP setup during local time it actually means to not render the configuration; not avoid reading metadata altogether.

Our plan is to have ephemeralDHCP apply static routes in the response correctly; this will prevent the timeout (UUIC), read all of the metadata from the service and then your local changes (network: config: disabled and apply_network_config: False) will ensure that cloud-init won't generate network config as requested.

Parsing the static routes isn't a huge lift so I hope we'll have a branch up quickly.

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-06T22:51:46.482092+00:00

Thanks Ryan, great explanation. I really appreciate you looking into this for me.

Let me know when you have something up and I can give it a test in our environment.

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-07T16:07:53.728267+00:00

Hi Andy,

I've put up a branch to handle classless static routes in DHCP responses. I've also published a test package here:

https://launchpad.net/~raharper/+archive/ubuntu/cloud-init-dev/+packages

I have it for bionic, but I can add Xenial or other Ubuntu releases if you can give that a test to see if it works to resolve the timeout.

If you could capture:

% ip a % ip addr show % ip route show

I'd like to confirm I'm adding the static routes correctly.

Typically, I'd take a running instance and install the newer cloud-init, then:

cloud-init clean --logs --reboot

which wipes instance data to make the filesystem look like it's booting as a new instance.

Then once it boots, cloud-init collect-logs and attach the tarball.

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-13T02:07:07.736279+00:00

Hi Ryan,

Apologies for just getting around to this now - completely forgot! Testing looks great - with no more timeout.

Pre-fix

Cloud-init v. 18.5-45-g3554ffe8-0ubuntu1~18.04.1 running 'init-local' at Thu, 13 Jun 2019 01:47:13 +0000. Up 3.00 seconds. 2019-06-13 01:48:44,083 - util.py[WARNING]: No active metadata service found Cloud-init v. 18.5-45-g3554ffe8-0ubuntu1~18.04.1 running 'init' at Thu, 13 Jun 2019 01:48:46 +0000. Up 95.42 seconds.

Post-fix

Cloud-init v. 19.1-9-gd8ea5dca-1~bddeb~18.04.1 running 'init-local' at Thu, 13 Jun 2019 01:55:24 +0000. Up 2.27 seconds. Cloud-init v. 19.1-9-gd8ea5dca-1~bddeb~18.04.1 running 'init' at Thu, 13 Jun 2019 01:55:29 +0000. Up 7.74 seconds.

The relevant log file suggests it works!

2019-06-13 01:55:24,096 - dhcp.py[DEBUG]: Performing a dhcp discovery on eth0 2019-06-13 01:55:24,096 - util.py[DEBUG]: Copying /sbin/dhclient to /var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhclient 2019-06-13 01:55:24,099 - util.py[DEBUG]: Running command ['ip', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True) 2019-06-13 01:55:24,107 - util.py[DEBUG]: Running command ['/var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhclient.pid', 'eth0', '-sf', '/bin/true'] with allowed return codes [0] (shell=False, capture=True) 2019-06-13 01:55:24,180 - util.py[DEBUG]: All files appeared after 0 seconds: ['/var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhclient.pid', '/var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhcp.leases'] 2019-06-13 01:55:24,180 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhclient.pid (quiet=False) 2019-06-13 01:55:24,180 - util.py[DEBUG]: Read 4 bytes from /var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhclient.pid 2019-06-13 01:55:24,180 - util.py[DEBUG]: Reading from /proc/448/stat (quiet=True) 2019-06-13 01:55:24,180 - util.py[DEBUG]: Read 297 bytes from /proc/448/stat 2019-06-13 01:55:24,180 - dhcp.py[DEBUG]: killing dhclient with pid=448 2019-06-13 01:55:24,181 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhcp.leases (quiet=False) 2019-06-13 01:55:24,181 - util.py[DEBUG]: Read 704 bytes from /var/tmp/cloud-init/cloud-init-dhcp-ff8b_366/dhcp.leases 2019-06-13 01:55:24,182 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 130.56.249.206/255.255.240.0 2019-06-13 01:55:24,182 - init.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 130.56.249.206/20 brd 130.56.255.255 2019-06-13 01:55:24,182 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '130.56.249.206/20', 'broadcast', '130.56.255.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2019-06-13 01:55:24,185 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True) 2019-06-13 01:55:24,187 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254', 'via', '130.56.248.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2019-06-13 01:55:24,190 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '0.0.0.0/0', 'via', '130.56.240.1', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2019-06-13 01:55:24,193 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 0.001 seconds 2019-06-13 01:55:24,193 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'method': 'GET', 'timeout': 30.0, 'headers': {'User-Agent': 'Cloud-Init/19.1-9-gd8ea5dca-1~bddeb~18.04.1'}} configuration 2019-06-13 01:55:24,563 - url_helper.py[DEBUG]: Read from http://169.254.169.254/openstack (200, 83b) after 1 attempts 2019-06-13 01:55:24,563 - DataSourceOpenStack.py[DEBUG]: Using metadata source: 'http://169.254.169.254'

I don't think the IP address output you asked for will help, as networkd will have already set up the networking correctly, but I can still provide them, or the full logs if you'd like.

Thanks!

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-13T14:32:56.984206+00:00

\o/

I would like to see the ip output for additional confirmation if you don't mind.

And I think we still need to bump the timeouts; you suggested that the default url timeouts from DatasourceEC2 are more realistic, correct?

I may file a separate bug for that since this bug covered the intial networking setup. Do you have logs for those timeouts that we could attach to the new bug?

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-13T20:32:10.528353+00:00

I've updated the branch with some unittests and fixes (for other routes besides a /32).

Same ppa:raharper/cloud-init-dev

cloud-init_19.1-12-gb5a47081-1~bddeb~18.04.1

Thanks for filing bug and testing!

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-24T14:49:00.241687+00:00

@Andy any change you could give the updated cloud-init package a test?

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-24T22:10:53.733327+00:00

Thanks for the reminder! Oops.

I've just tested your new build cloud-init_19.1-12-gb5a47081-1~bddeb~18.04.1 which seemed to work perfectly.

Also, here's the networking details you asked for:

ubuntu@test-cloudinit:~$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:6f:f3:84 brd ff:ff:ff:ff:ff:ff inet 130.56.249.206/20 brd 130.56.255.255 scope global dynamic eth0 valid_lft 1814279sec preferred_lft 1814279sec inet6 fe80::f816:3eff:fe6f:f384/64 scope link valid_lft forever preferred_lft forever

ubuntu@test-cloudinit:~$ ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:6f:f3:84 brd ff:ff:ff:ff:ff:ff inet 130.56.249.206/20 brd 130.56.255.255 scope global dynamic eth0 valid_lft 1814274sec preferred_lft 1814274sec inet6 fe80::f816:3eff:fe6f:f384/64 scope link valid_lft forever preferred_lft forever

ubuntu@test-cloudinit:~$ ip route show default via 130.56.240.1 dev eth0 proto dhcp metric 1024 130.56.240.0/20 dev eth0 proto kernel scope link src 130.56.249.206 169.254.169.254 via 130.56.248.255 dev eth0 proto dhcp metric 1024

Attaching debug logs too.

Thanks!

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-24T22:11:21.324842+00:00

Launchpad attachments: collect-logs output

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-06-24T23:07:04.580037+00:00

And I think we still need to bump the timeouts; you suggested that the default url timeouts from DatasourceEC2 are more realistic, correct?

So in our images we currently have these values set:

datasource_list:

datasource: OpenStack: max_wait: 90 timeout: 30 retries: 3

This initially came about because some users were reporting that their SSH host keys were changing after a reboot.

What happened was their instance would initially boot and pick up the OpenStack data source (with default cloud-init config) and get provisioned OK. Some time later they'd reboot and the metadata server wouldn't respond as quickly and so cloud-init would fall back to the EC2 data source.

This would result in their 'instance id' switching from an OpenStack UUID to and EC2 i-xxxxxxx format one and cloud-init would think it's a different instance and reprovision.

The timeouts aren't normally a problem, but they can stretch out when we're having message queue issues. Our metrics show over the last 30 days our max was 13 secs, so we should probably revisit the values we have set and drop them. Launchpad attachments: Screenshot from 2019-06-25 08-24-44.png

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-06-25T14:51:10.889914+00:00

Thanks for verifying!

For the timeout, I'll start a new bug and we can discuss changes there.

W.r.t the datasource change; that really shouldn't happen; at least on images using cloud-init's ds-identify.

However, since you're hard-coding the datasource_list, this is going to disable the detection.

Openstack identifies itself via platform metadata (DMI tables); so cloud-init will detect this value and set the datasource to OpenStack (or config-drive via attached devices filesystem labels).

EC2 also identifies itself and you'd never see an OpenStack cloud instance be confused for Ec2 resulting in different boots.

It may be worth revisiting your images to see if you can rely on cloud-init's ds-identify (called through the systemd-generator we provide.

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-07-11T01:13:21.920918+00:00

Hi Ryan,

Just built a Debian 10 image and am seeing this issue again. Just checking in to see if tgere's a release yet with the fix?

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-07-11T15:35:49+00:00

On Wed, Jul 10, 2019 at 6:20 PM Andy Botting andy@andybotting.com wrote:

Hi Ryan,

Just built a Debian 10 image and am seeing this issue again. Just checking in to see if tgere's a release yet with the fix?

The branch has not yet landed. Once it does then the next cloud-init SRU will release this back through Xenial. We'll also cut a 19.2 cloud-init release which would include the fix.

-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1821102

Title: Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack

To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1821102/+subscriptions

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Botting(andybotting) wrote on 2019-07-12T00:08:50+00:00

The branch has not yet landed. Once it does then the next cloud-init SRU will release this back through Xenial. We'll also cut a 19.2 cloud-init release which would include the fix.

Thanks Ryan.

ubuntu-server-builder commented 1 year ago

Launchpad user Server Team CI bot(server-team-bot) wrote on 2019-07-16T22:40:18.242784+00:00

This bug is fixed with commit 07b17236 to cloud-init on branch master. To view that commit see the following URL: https://git.launchpad.net/cloud-init/commit/?id=07b17236

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2019-07-17T17:12:03.398924+00:00

This bug is believed to be fixed in cloud-init in version 19.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.