canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 883 forks source link

GCE datasource changes networking behavior #5820

Closed chen23 closed 1 month ago

chen23 commented 1 month ago

Bug report

The GCE datasource is changing the behavior of the network interfaces that follow. Ideally the system should fallback to the same networking state regardless of the list of datasources that is provided.

This is similar to #4680, following the change of #4163 a datasource that follows GCE will get a different network environment than before. This behavior is observed on multi-nic hosts and in particular a host that has DHCP available on the first interface and no DHCP on the second interface.

This creates an issue where the host needs to boot in a mixture of environments that have an available datasource and those that do not.

Steps to reproduce the problem

Have a VM that is configured with DHCP enabled on primary interface and no DHCP on secondary interface. Do not provide any cloud-init datasources:

The simplest method to reproduce to is to boot two hosts that have a datasource of:

this will work as expected

[NoCloud, None]

and

this will not work as expected

[ GCE, NoCloud, None]

In the first non-GCE example the system will have the following network state following the boot:

root@ubuntu:~# cat /etc/cloud/cloud.cfg.d/90_dpkg.cfg
# to update this file, run dpkg-reconfigure cloud-init
datasource_list: [ NoCloud, None ]
root@ubuntu:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether bc:24:11:08:98:e0 brd ff:ff:ff:ff:ff:ff
    altname enp0s18
    inet 192.168.1.193/24 metric 100 brd 192.168.1.255 scope global dynamic ens18
       valid_lft 84651sec preferred_lft 84651sec
    inet6 fd53:e24d:b056:ead5:be24:11ff:fe08:98e0/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 1762sec preferred_lft 1762sec
    inet6 fe80::be24:11ff:fe08:98e0/64 scope link
       valid_lft forever preferred_lft forever
3: ens19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether bc:24:11:6e:1a:bf brd ff:ff:ff:ff:ff:ff
    altname enp0s19

In the second GCE example the system will have the following networking state

root@ubuntu:~# cat /etc/cloud/cloud.cfg.d/90_dpkg.cfg
# to update this file, run dpkg-reconfigure cloud-init
datasource_list: [ GCE, NoCloud, None ]
root@ubuntu:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether bc:24:11:08:98:e0 brd ff:ff:ff:ff:ff:ff
    altname enp0s18
3: ens19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether bc:24:11:6e:1a:bf brd ff:ff:ff:ff:ff:ff
    altname enp0s19

Environment details

cloud-init logs

there's no logs because it appears to occur during the search phase of looking for a datasource

root@ubuntu:~# cat /var/run/cloud-init/ds-identify.log
[up 2.81s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ GCE, NoCloud, None ]
DMI_PRODUCT_NAME=Standard PC (i440FX + PIIX, 1996)
DMI_SYS_VENDOR=QEMU
DMI_PRODUCT_SERIAL=
DMI_PRODUCT_UUID=280bbe45-44b7-434b-8013-1f4fe7983749
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=
DMI_BOARD_NAME=unavailable
FS_LABELS=UEFI,UEFI,cloudimg-rootfs,BOOT
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=/vmlinuz-6.8.0-44-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
VIRT=kvm
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_VERSION=#44-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 13 13:35:26 UTC 2024
UNAME_MACHINE=x86_64
DSNAME=
DSLIST=GCE NoCloud None
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=341 ppid=323
is_container=false
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 2.82s] returning 1
chen23 commented 1 month ago

closing b/c the reproduction is not accurately capturing the issue, will re-open with a better repro