canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 930 forks source link

Stoping/Restarting lxc containers disconnects machine from network #12482

Open CalvoM opened 11 months ago

CalvoM commented 11 months ago

Required information

Issue description

Whenever I restart or stop a container my network is interrupted and I lose connection for a second.

Steps to reproduce

  1. Launch a container.
  2. Restart it.

Information to attach

Here is the output of ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d4:54:8b:0c:0d:fa brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.32/24 brd 192.168.100.255 scope global dynamic noprefixroute wlp0s20f3
       valid_lft 84782sec preferred_lft 84782sec
    inet6 fe80::6472:db0:ffe0:72f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:2f:31:f8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: mpqemubr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:bb:f2:e5 brd ff:ff:ff:ff:ff:ff
    inet 10.232.36.1/24 brd 10.232.36.255 scope global mpqemubr0
       valid_lft forever preferred_lft forever
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:ad:e2:be:29 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
6: br-665d3d12eebc: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:21:59:f7:e9 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-665d3d12eebc
       valid_lft forever preferred_lft forever
8: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:94:d6:a4 brd ff:ff:ff:ff:ff:ff
    inet 10.212.104.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe94:d6a4/64 scope link 
       valid_lft forever preferred_lft forever
46: veth38cde9b1@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether b2:9b:6d:84:2e:e1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::b09b:6dff:fe84:2ee1/64 scope link 
       valid_lft forever preferred_lft forever
47: nordlynx: <POINTOPOINT,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none 
    inet 10.5.0.2/32 scope global nordlynx
       valid_lft forever preferred_lft forever

Output of 'ip -r`

default via 192.168.100.1 dev wlp0s20f3 proto dhcp src 192.168.100.32 metric 20600 
10.212.104.0/24 dev lxdbr0 proto kernel scope link src 10.212.104.1 
10.232.36.0/24 dev mpqemubr0 proto kernel scope link src 10.232.36.1 linkdown 
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-665d3d12eebc proto kernel scope link src 172.18.0.1 linkdown 
192.168.100.0/24 dev wlp0s20f3 proto kernel scope link src 192.168.100.32 metric 600 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown 

Output of lxc network show lxdbr0

config:
  ipv4.address: 10.212.104.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:9aa:135d:6a77::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/SampleBionic
- /1.0/instances/SampleFocal
- /1.0/instances/SampleJammy
- /1.0/instances/SampleLunar
- /1.0/instances/SampleMantic
- /1.0/instances/SnapdTestJammy
- /1.0/instances/SnapdTestJammy2
- /1.0/profiles/default
- /1.0/profiles/pycloudlib-vm-default
managed: true
status: Created
locations:
- none

Output of lxc config show SampleJammy --expanded

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 22.04 LTS amd64 (daily) (20230720)
  image.label: daily
  image.os: ubuntu
  image.release: jammy
  image.serial: "20230720"
  image.type: squashfs
  image.version: "22.04"
  volatile.base_image: d0be7c60007d109f2079a955fb42ac00ec71b5c369e08a41210a7c15bebe78a8
  volatile.cloud-init.instance-id: d84bf4b7-cbf7-4ba3-8693-16737b1fecb5
  volatile.eth0.host_name: veth38cde9b1
  volatile.eth0.hwaddr: 00:16:3e:b6:8e:db
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: bca95f70-f534-4c3f-8d44-52356ae93c97
  volatile.uuid.generation: bca95f70-f534-4c3f-8d44-52356ae93c97
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
mr-cal commented 11 months ago

I've been experiencing the same behavior since upgrading to Kubuntu 23.10 for Wired connection 2 being deactivated.

I've verified that I don't experience actual network disconnectivity, just an annoying notification that I have silenced.

CalvoM commented 9 months ago

@tomponline Is there any progress on this? I am still experiencing this.

tomponline commented 9 months ago

Hi @CalvoM we not yet looked at this case. I've asked @gabrielmougard to see if he can reproduce it and diagnose the issue. Thanks

gabrielmougard commented 9 months ago

I have tried monitoring the lxdbr0 status (every 0.1s) to check for any bridge flapping while restarting one of my container (u1), and here is what I got:

630028

In my case, this is very brief (I actually never noticed that before). I guess that depending on your host NIC, this delay can vary. I'll investigate this.

gabrielmougard commented 9 months ago

@mr-cal @CalvoM @tomponline I found a hack: In order to ensure that the bridge is always up, regardless of whether containers are running, I created a dummy interface on the host and attached it to lxdbr0 to keep it "up" even when no containers are running:

ip link add dummy0 type dummy
ip link set dummy0 master lxdbr0
ip link set dummy0 up

...which should keep the bridge from going down when the last container is stopped. Here is my result:

630031

My flapping seems to have disappeared. Can you try that on your side to see if this is related? If yes, I'll work on a LXD fix to setup this behaviour internally.

MidnightRocket commented 8 months ago

I think I might be having the same or at least a very similar issue on my setup. I am running a Dietpi OS setup. (For those unfamiliar, Dietpi is a minimal fork of Debian, with focus on optimisations for single board computers such as Raspberry Pi).

I initially posted this issue on Dietpi's issue tracker https://github.com/MichaIng/DietPi/issues/6884, but after some investigation, the issue seems to be with lxd itself rather than with Dietpi.


On my system the hosts network connection is lost "permanently", until it is manually brought online with sudo ifup eth0, or after a reboot. However my setup is headless server setup, which might be making the difference?


An interesting thing is that after the network interface as manually been brought up with sudo ifup eth0 or even sudo ifup --alow=hotplug eth0, the network interface is stable until next reboot, or until sudo systemctl daemon-reload is run. This means that subsequent container stops and restart does not bring down the network interface, until the next reboot or systemd deamon-reload.



As a workaround, I have found that modifying the /etc/network/interfaces file, so that the primary network interface is set to auto eth0 instead of allow-hotplug eth0, which is default on Dietpi as well as on Debian Bookworm, solves the issue.

The issue is not present in a fresh headless install of Debian Bookworm, even though that Debian and Dietpi seems to share the exact same network setup. That is to say that on Debian, is the issue not present even though the network interface is configured with allow-hotplug eth0.

gabrielmougard commented 8 months ago

@CalvoM can you reproduce this workaround ?