Closed ubuntu-server-builder closed 1 month ago
Launchpad user Robert Schweikert(rjschwei) wrote on 2020-01-17T18:52:25.821757+00:00
Launchpad user Ryan Harper(raharper) wrote on 2020-01-17T19:06:38.016273+00:00
Can you capture cloud-init collect-logs ? In Oracle, I suspect this is related to iscsi root where initramfs already has networking up; in Ubuntu we're collecting the existing configuration from the initramfs as a network-config source and we don't bring up Ephemeral DHCP to crawl IMDS; I wonder if that's missing on SuSE path (knowing whether networking is already up due to initramfs/iscsiroot)?
Launchpad user Robert Schweikert(rjschwei) wrote on 2020-01-17T19:20:48.022010+00:00
Yes, this is in OCI.
I am not in a position to run cloud-init collect-logs as I am not able to get into a system with cloud-init 19.4 just yet.
Launchpad user Robert Schweikert(rjschwei) wrote on 2020-01-17T20:15:55.087352+00:00
Whatever the code in net/cmdline does is certainly very distribution specific as we all decided collectively/separately to do things differently. It is not really a surprise that the detection that there is already a network from booting off iscsi is not working.
Thinking there should be a distribution independent way to figure out if we already have a network connection or not.
Launchpad user Ryan Harper(raharper) wrote on 2020-01-17T20:39:38.708949+00:00
Unfortunately each distro tends to have their own initramfs networking config format. As such, cloudinit/net/cmdline.py has implemented klibc parsing (which Ubuntu/Debian support), but dracut does something different; and I'm not sure what SuSE does here; but adding a parser for the initramfs format used would handle this.
https://github.com/canonical/cloud-init/blob/master/cloudinit/net/cmdline.py#L42
Thinking there should be a distribution independent way to figure out if we already have a network connection or not.
There is, but we need more than "is networking up"; rather we need to translate the existing configuration and merge that with whatever else may come from IMDS; in Oracle the iscsiroot has a permanent dhcp config on a specific interface, however IMDS can provide network config for additional interfaces, so we must merge them. The OCI datasource already does this but distros need to provide an initramfs network config parser to extract the network config generated in the initramfs to something cloud-init can understand.
Launchpad user Robert Schweikert(rjschwei) wrote on 2020-01-17T21:03:11.523115+00:00
Sorry for being dense, by the time we get to the point where we decide whether or not to bring up an ephemeral network we have long left the initrd and Since we are booting over iscsi the network is up and configured. Any configuration information we might need can be extracted from the network via "ip" commands. Those are distro independent thus a generic "translator" "live_config_to_net_cfg" would work everywhere. What am I missing?
Launchpad user Ryan Harper(raharper) wrote on 2020-01-17T21:43:48+00:00
On Fri, Jan 17, 2020 at 15:15 Robert Schweikert 1860164@bugs.launchpad.net wrote:
Sorry for being dense, by the time we get to the point where we decide whether or not to bring up an ephemeral network we have long left the initrd and Since we are booting over iscsi the network is up and configured. Any configuration information we might need can be extracted from the network via "ip" commands. Those are distro independent thus a generic "translator" "live_config_to_net_cfg" would work everywhere. What am I missing?
The initrd supports more than just dhcp or static ip config and ip commands won’t tell you which was used. There may be dns or other options, so it’s best to parse the initramfs format which parses the kernel command line anyhow to bring up networking in the initramfs.
-- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1860164
Title: cloud-init generates a traceback if a default route already exists during ephemeral network setup
To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1860164/+subscriptions
Launchpad user Robert Schweikert(rjschwei) wrote on 2020-01-17T22:27:38.745785+00:00
Removing myself as assignee as I really have no idea right of the bat what we are after here and I will most likely not have the time to dig into all the gory details.
Here is the doc for what is supported w.r.t. configuration in dracut for these types of situation:
http://man7.org/linux/man-pages/man7/dracut.cmdline.7.html
and in detail the way the network would be configured:
ip={dhcp|on|any|dhcp6|auto6|either6}
dhcp|on|any
get ip from dhcp server from all interfaces. If root=dhcp,
loop sequentially through all interfaces (eth0, eth1, ...)
and use the first with a valid DHCP root-path.
auto6
IPv6 autoconfiguration
dhcp6
IPv6 DHCP
either6
if auto6 fails, then dhcp6
ip=<interface>:{dhcp|on|any|dhcp6|auto6}[:[<mtu>][:<macaddr>]]
This parameter can be specified multiple times.
dhcp|on|any|dhcp6
get ip from dhcp server on a specific interface
auto6
do IPv6 autoconfiguration
<macaddr>
optionally set <macaddr> on the <interface>. This cannot be
used in conjunction with the ifname argument for the same
<interface>.
ip=<client-IP>:[<peer>]:<gateway-IP>:<netmask>:<client_hostname>:<interface>:{none|off|dhcp|on|any|dhcp6|auto6|ibft}[:[<mtu>][:<macaddr>]]
explicit network configuration. If you want do define a IPv6
address, put it in brackets (e.g. [2001:DB8::1]). This parameter
can be specified multiple times. <peer> is optional and is the
address of the remote endpoint for pointopoint interfaces and it
may be followed by a slash and a decimal number, encoding the
network prefix length.
<macaddr>
optionally set <macaddr> on the <interface>. This cannot be
used in conjunction with the ifname argument for the same
<interface>.
ip=<client-IP>:[<peer>]:<gateway-IP>:<netmask>:<client_hostname>:<interface>:{none|off|dhcp|on|any|dhcp6|auto6|ibft}[:[<dns1>][:<dns2>]]
explicit network configuration. If you want do define a IPv6
address, put it in brackets (e.g. [2001:DB8::1]). This parameter
can be specified multiple times. <peer> is optional and is the
address of the remote endpoint for pointopoint interfaces and it
may be followed by a slash and a decimal number, encoding the
network prefix length.
ifname=
Warning
Do not use the default kernel naming scheme for the interface
name, as it can conflict with the kernel names. So, don’t use
"eth[0-9]+" for the interface name. Better name it "bootnet"
or "bluesocket".
rd.route=<net>/<netmask>:<gateway>[:<interface>]
Add a static route with route options, which are separated by a
colon. IPv6 addresses have to be put in brackets.
Example.
rd.route=192.168.200.0/24:192.168.100.222:ens10
rd.route=192.168.200.0/24:192.168.100.222
rd.route=192.168.200.0/24::ens10
rd.route=[2001:DB8:3::/8]:[2001:DB8:2::1]:ens10
bootdev=<interface>
specify network interface to use routing and netroot information
from. Required if multiple ip= lines are used.
nameserver=<IP> [nameserver=<IP> ...]
specify nameserver(s) to use
Then there are vlan, bond, bridge, and team kernel command line arguments one could use.
Launchpad user Launchpad Janitor(janitor) wrote on 2020-03-18T04:17:27.665439+00:00
[Expired for cloud-init because there has been no activity for 60 days.]
Thinking there should be a distribution independent way to figure out if we already have a network connection or not.
There is, but we need more than "is networking up"; rather we need to translate the existing configuration and merge that with whatever else may come from IMDS; in Oracle the iscsiroot has a permanent dhcp config on a specific interface, however IMDS can provide network config for additional interfaces, so we must merge them. The OCI datasource already does this but distros need to provide an initramfs network config parser to extract the network config generated in the initramfs to something cloud-init can understand.
I don't think that this really makes sense. Sure, the initramfs may have some dhcp config that it got from the IMDS, but why would that be necessary to merge into the datasource-provided datasource? The issue is just a failure in ephemeral network setup, this failure isn't code that deals with network configuration. Why would you want this?
Any configuration information we might need can be extracted from the network via "ip" commands.
I agree with @rjschwei here, this is far more cross platform, and frankly solves the problem at hand. I'm not sure how merging IMDS networking configuration with an initramfs dhcp thing solves anything related to this issue.
@holmanb, The connectivity url was added since this issue was active. I'm pretty sure it sidesteps this issue.
I'm pretty sure it sidesteps this issue.
@TheRealFalcon I think that you are right on the happy path, but I don't think that the url check is a robust solution to this problem. It makes assumptions which might not be true. If the datasource isn't yet available which causes the connectivity check to fail then this same issue will persist.
There are other (hypothetical) ways in which depending on the connectivity url might be broken or cause undesirable behavior. Imagine a cloud where the image pxe boots on one network but after the initial dhcp/tftp a different network is used for IMDS (i.e. subsequent dhcp responses provide a different route with a longer prefix match to override the previous route). In this case, the pre-existing route would cause the connectivity check to fail after a 5 second timeout, but then proceed and otherwise behave correctly. This example might sound contrived, but a cloud provider should probably not want the instance to be able to access the PXE server from which it booted for multiple security-related reasons.
Additionally, I have a hard time believing that a round trip to an http server would be faster than locally checking the network configuration, so there may be a performance win with removing the connectivity url check altogether once this codepath is more robust.
This bug was originally filed in Launchpad as LP: #1860164
Launchpad details
Launchpad user Robert Schweikert(rjschwei) wrote on 2020-01-17T18:37:30.886100+00:00
If a route already exists when the ephemeral network exists cloud-init will generate the following traceback:
2020-01-16 21:14:22,584 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceOracle.DataSourceOracle'> failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/cloudinit/sources/init.py", line 760, in find_source if s.update_metadata([EventType.BOOT_NEW_INSTANCE]): File "/usr/lib/python2.7/site-packages/cloudinit/sources/init.py", line 649, in update_metadata result = self.get_data() File "/usr/lib/python2.7/site-packages/cloudinit/sources/init.py", line 273, in get_data return_value = self._get_data() File "/usr/lib/python2.7/site-packages/cloudinit/sources/DataSourceOracle.py", line 195, in _get_data with dhcp.EphemeralDHCPv4(net.find_fallback_nic()): File "/usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py", line 57, in enter return self.obtain_lease() File "/usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py", line 109, in obtain_lease ephipv4.enter() File "/usr/lib/python2.7/site-packages/cloudinit/net/init.py", line 920, in enter self._bringup_static_routes() File "/usr/lib/python2.7/site-packages/cloudinit/net/init.py", line 974, in _bringup_static_routes ['dev', self.interface], capture=True) File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 2083, in subp cmd=args) ProcessExecutionError: Unexpected error while running command.
This is a regression from 19.1 on SUSE where exiting routes were simply skipped.