MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.84k stars 495 forks source link

LXD brings network interfaces down on container stop #6884

Closed MidnightRocket closed 8 months ago

MidnightRocket commented 8 months ago

Creating a bug report/issue

When running lxd on a dietpi installation, the network interface is brought down every time a container is stopped or restarted.

Required Information

Additional Information (if applicable)

Steps to reproduce

  1. First install lxd and dependencies via apt.
    sudo apt update
    sudo apt install lxd dnsmasq iptables
  2. Then setup lxd with sudo lxd init I do not know whether or not different lxd network configurations impacts this. But I have configured the following
    1. Default lxd interface lxdbr0
    2. Subnet/Cider: 10.200.1.1/24
    3. NATed
    4. No access from LAN
    5. No IPv6
  3. Create a container
    sudo lxc launch images:debian/12 test_container
  4. Stop the newly created container
    sudo lxc stop test_container

Expected behaviour

The container should be stopped without bringing the host's network interfaces down/offline.

Actual behaviour

Somehow when the container is stopped the host's network interface is brought down/offline.
I do not see why this is happening. But I suppose it is related to the fact that the host is connected to the container through the default lxd network interface lxdbr0. And when the container is stopped, that might be crashing the rest of the host's network interfaces. But this is purely speculation.

Extra details

As said I have replicated even in a virtualised Parallels VM on an intel Mac, using the official image found at dietpi.com.

The problem is not present on a fresh Debian Bookworm installation, using the exact same configurations.

The problem can be mitigated by manually bringing the interface up after it has crashed using:

sudo ifup eth0

Then the network interface seems to be stable until the next reboot. Meaning the network interface will not crash on subsequent contianer stops or restart, until after the host has rebooted, at which point the issue comes back.

MichaIng commented 8 months ago

I have no experience with LXD, so cannot really help despite doing own research.

When you say "fresh Debian Bookworm", which image do you mean exactly? And does it use anything different then ifupdown and /etc/network/interfaces to setup network interfaces? AFAIK, it is standard for all official server and minimal netinst installs of Debian, but desktop images might use NetworkManager, which might behave differently.

And when you say "brought down/offline", can you check what exactly is de-configured? I guess LXD does not run ifdown eth0, so probably it is a particular thing, route, DNS, IP address, or something:

ip a
ip r
cat /etc/resolv.conf

Although, if ifup eth0 indeed solves it and does not throw an error that the interface is up already, then I am not aware of any other way than ifdown.

... okay thinking about it: lxdbr0 is a bridge interface, not a host<>guest one. If that container is reachable from LAN, then indeed it must replace eth0 with the bridge. Can you also run the above commands when the container is up to verify this? Then it would make sense that the network is at least temporarily down when then bridge is de-configured. Not sure why it does not bring eth0 back up afterwards. Probably auto eth0 instead of allow-hotplug eth0 in /etc/network/interfaces makes a difference.

MidnightRocket commented 8 months ago

By fresh Debian installation I meant fresh Debian server installation.

While container is up

Output of ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:1c:42:f4:d0:6d brd ff:ff:ff:ff:ff:ff
    inet 10.211.55.49/24 brd 10.211.55.255 scope global dynamic eth0
       valid_lft 1762sec preferred_lft 1762sec
    inet6 fdb2:2c26:f4e4:0:21c:42ff:fef4:d06d/64 scope global dynamic mngtmpaddr
       valid_lft 2591964sec preferred_lft 604764sec
    inet6 fe80::21c:42ff:fef4:d06d/64 scope link
       valid_lft forever preferred_lft forever
3: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet6 fe80::b36d:b30a:fdf0:f031/64 scope link stable-privacy
       valid_lft forever preferred_lft forever
4: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:66:a5:cc brd ff:ff:ff:ff:ff:ff
    inet 10.200.1.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
6: veth59d50642@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether f6:23:fb:35:bc:91 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Output of ip r

default via 10.211.55.1 dev eth0
10.200.1.0/24 dev lxdbr0 proto kernel scope link src 10.200.1.1
10.211.55.0/24 dev eth0 proto kernel scope link src 10.211.55.49

Output of cat /etc/resolv.conf

domain localdomain
search localdomain
nameserver 10.211.55.1

After container stop

Output of ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 00:1c:42:f4:d0:6d brd ff:ff:ff:ff:ff:ff
3: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet6 fe80::bc6d:daca:3917:6434/64 scope link stable-privacy
       valid_lft forever preferred_lft forever
4: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:66:a5:cc brd ff:ff:ff:ff:ff:ff
    inet 10.200.1.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever

Output of ip r

10.200.1.0/24 dev lxdbr0 proto kernel scope link src 10.200.1.1 linkdown

Output of cat /etc/resolv.conf

domain localdomain
search localdomain
nameserver 10.211.55.1

I can see that on my Debian installation that the primary interface is configured with allow-hotplug eth0. So not sure that is the problem. I will try to check to be sure.

MidnightRocket commented 8 months ago

Okay changing allow-hotplug eth0 to auto eth0 in the /etc/network/interfaces files fixes the issue. Not sure how to prevent dietpi-config from overwriting it, although it is rare that network is configured.

MichaIng commented 8 months ago

I can see that on my Debian installation that the primary interface is configured with allow-hotplug eth0.

Then it is not, since it is exactly the same on DietPi.

Confusing, the container stop seems to completely deconfigure eth0 indeed and brings it down completely, while leaving the now de-functional and unused bridge interface in place. This does not make any sense, of course. EDIT: Probably it still works, just connected with the container-internal interface only 🤔.

No access from LAN

While are you using a bridge at all if you do not want access from LAN? Isn't it possible to use VLAN interface or similar, independent of any other network interface? I mean a bridge is traditionally to connect two interfaces, the container internal one and the host LAN interface, in this case.

... while I am writing:

Okay changing allow-hotplug eth0 to auto eth0 in the /etc/network/interfaces files fixes the issue.

Okay that is great. Then it seems to run ifup -a, which brings up all auto interfaces only, indeed. Not great. As it (should) know(s) which interface(s) it was bridging, it should also know which exact interfaces to bring back up afterwards. Hence on plain Debian server, the main/LAN interface is auto?

MidnightRocket commented 8 months ago

While are you using a bridge at all if you do not want access from LAN?

This is how lxd init configures it, and I do not know why 🤷🏽

Hence on plain Debian server, the main/LAN interface is auto?

The main interface is configured with allow-hotplug. The only interface configured with auto is the loopback interface lo

MichaIng commented 8 months ago

Okay. The lo interface is implicitly configured and is not required in the configuration file at all. However, strange that on vanilla Debian, the interface remains up, while on DietPi it does not, despite both using the exact same network stack and config 🤔.

Does the container engine throw any errors or warnings when you stop the container?

MidnightRocket commented 8 months ago

Nope no error is thrown, at any stage: launch, start, stop.

MidnightRocket commented 8 months ago

It is also weird that manually bringing up the eth0 interface with sudo ifup eth0 after container stop, the issue is temporarily solved until next reboot.

MichaIng commented 8 months ago

You mean on reboot, the interface is not brought back up? Does it touch the configs in /etc/network/interfaces or one in /etc/network/interfaces.d?

MidnightRocket commented 8 months ago

Lxd does not touch /etc/network/interfaces nor /etc/network/interfaces.d.

What I meant was that when stopping a lxd container, and manually bringing up the eth0 with sudo ifup eth0, then subsequent container stops, does not bring down the eth0 inferface, until the host is rebooted.

However when rebooting the lxdbr0 interface is connected to the host. I believe that the lxd daemon is doing this.

Also when lxd has brought down eth0 interface, the interface is automatically brought up by a reboot.

MichaIng commented 8 months ago

What I meant was that when stopping a lxd container, and manually bringing up the eth0 with sudo ifup eth0, then subsequent container stops, does not bring down the eth0 inferface, until the host is rebooted.

That is strange indeed, since on boot, the exact same should happen, as of the ifup@eth0.service, created when allow-hotplug eth0 is defined 🤔. Does it make a difference when you bring up the interface after container stop exactly like the service does?

sudo ifup --allow=hotplug eth0

And does it make a difference when you run sudo ifquery --state eth0 before starting or stopping the container?

Also when lxd has brought down eth0 interface, the interface is automatically brought up by a reboot.

That is of course expected, as of the above service.

MidnightRocket commented 8 months ago

Does it make a difference when you bring up the interface after container stop exactly like the service does?

sudo ifup --allow=hotplug eth0

This exhibits the same behaviour as without the --allow=hotplug flag. That is the eth0 interface is stable on subsequent container stops, until the host is rebooted again.

And does it make a difference when you run sudo ifquery --state eth0 before starting or stopping the container?

No this does not make any difference.

MichaIng commented 8 months ago

The last idea I have: Does it make a difference when you disable the LXD service, i.e. do not start container and container engine at boot, but do so manually after boot, when the LAN interface is assured to be fully configured? Probably the engine stores the network state when it starts and restores it when it stops, and the Ethernet interface is not fully up when the engine starts already.

MidnightRocket commented 8 months ago

Nope this does not make any difference either 😕🧐

MidnightRocket commented 8 months ago

Okay after a lxd container is stopped, and the network interface is brought back up using sudo ifup eth0. If reconfiguring the network interface using sudo dietpi-config -> Network Options: Adapters -> Ethernet -> Apply, the same problem returns. Meaning that stopping a container after this, will yet again bring down the eth0 interface.

MidnightRocket commented 8 months ago

Actually after bringing the eth0 interface up again, and then running sudo systemctl daemon-reload, also results in subsequent container stops, bringing down the eth0 interface.

MichaIng commented 8 months ago

Interesting. systemctl daemon-reload rereads the ifup@eth0.service. But it is not re-executed or so, so the interface state should not be touched. LXD definitely does some strange magic on shutdown, doing/trying to do more than it should, which leads to such unexpected surprises. When it was tested on plain Debian, then bugs might slip through, but I still cannot see any relevant difference between Debian and DietPi in this regards.

I think this is something to ask at the Debian bug tracker, or probably directly here: https://github.com/lxc/incus/issues Someone with insights into the LXD shutdown logic should have an idea on which condition the main interface might be brought and left down as well.

MidnightRocket commented 8 months ago

I have found this https://github.com/canonical/lxd/issues/12482, which could be related.

Thanks for your help and kindness in investigating this issue😃.

For anyone seeing this, a workaround is to modify the /etc/network/interfaces, and changing the primary network interface (E.g. eth0 or wlan0) from being allow-hotplug eth0 to auto eth0.

[!Note] Reconfiguring network settings via dietpi-config overwrites this.

Closing this issue for now, as it seems to be an issue with lxd, and not dietpi it self.