canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

Default 1500 MTU on managed bridge networks isn't always appropriate #12809

Open rajannpatel opened 9 months ago

rajannpatel commented 9 months ago

Issue description

When running lxd init --auto a lxdbr0 interface is created, and all LXD containers will have network interfaces on lxdbr0. LXD hardcodes the MTU on lxdbr0 at 1500, which is a sensible default, but causes problems or inefficiencies when the MTU on the host machine’s default network adapter is different. On Oracle Cloud the network adapter is configured for jumbo frames, and has an MTU of 9000. On Google Cloud the MTU on the default adapter is 1460.

rajan_patel@landscapebeta:~$ lxd init --auto
rajan_patel@landscapebeta:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
    link/ether 42:01:0a:80:00:20 brd ff:ff:ff:ff:ff:ff
    inet 10.128.0.32/32 metric 100 scope global dynamic ens4
       valid_lft 86295sec preferred_lft 86295sec
    inet6 fe80::4001:aff:fe80:20/64 scope link 
       valid_lft forever preferred_lft forever
3: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:50:44:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.145.247.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fd42:f052:4168:f2c2::1/64 scope global 
       valid_lft forever preferred_lft forever

As a result, LXD containers are unable to access the Internet. This breaks apt update, snap install, and effectively creates an airgapped container. What is worse, the issue is completely undocumented, so it requires a lot of digging and requires an understanding of networking to figure out the solution.

Anybody using LXD on Google Cloud is adversely impacted by this. They have a page dedicated to explaining the issues of mismatched MTUs here: https://cloud.google.com/vpc/docs/mtu

Steps to reproduce

  1. Step one: lxd init --auto
  2. Step two: ip a
  3. Step three: observe the lxdbr0 interface has an MTU of 1500, regardless of what MTU the physical NIC the bridge is created on, is configured to.

Information to attach

lxd init --auto should create the lxdbr0 interface with an MTU that matches the default network adapter on the host machine. This is manually achieved today via 2 commands:

# identify the default network adapter on the machine, the next command like check the MTU configuration on this adapter
read -r INTERFACE < <(ip route | awk '$1=="default"{print $5; exit}')

# if your network uses jumbo frames (MTU 9000), or an MTU smaller than 1500 (as found on Google Cloud VMs), use a matching MTU on lxdbr0 (which is created by lxd init --auto)
lxc network set lxdbr0 bridge.mtu=$(ip link show $INTERFACE | awk '/mtu/ {print $5}')

When using LXD as a stepping stone for trying other Canonical software, we have to explain these MTU pitfalls. It makes LXD look unrefined and unnecessarily complex when adding these steps to a “quickstart” or “getting started” how-to: https://gist.github.com/rajannpatel/cdc43b30a863824b139fb7a18f2e99a5

tomponline commented 9 months ago

This breaks apt update, snap install, and effectively creates an airgapped container.

This is interesting because most home internet connections use PPPoE tunneling that also reduces the MTU to the internet to less than 1500, and yet the internal network still commonly uses 1500 MTU on all devices and the internet connections are still working.

Normally this is because the ISP's router or network performs TCP MSS clamping

https://www.cloudflare.com/en-gb/learning/network-layer/what-is-mss/

So presumably in these environments no such clamping is being applied.

Additionally it may also be because these provider networks (or your particular local firewall setup) is blocking PMTU:

https://en.wikipedia.org/wiki/Path_MTU_Discovery

In principle your proposal sounds like a good idea, I will consider if there are any downsides or possible regressions making this change would introduce.

tomponline commented 9 months ago

@rajannpatel interestingly the GCP doc you linked to says:

TCP SYN and SYN-ACK packets Google Cloud performs MSS clamping if necessary, changing the MSS to ensure packets fits within the MTU.

So this makes me wonder why you are experiencing these issues?

Can you advise further what the specific problem is, is it UDP traffic (DNS perhaps) that is causing the problem? As MSS clamping only affects TCP.

tomponline commented 9 months ago

Is it possible to provider reproducer steps using just LXD without landscape, i.e lxc launch ... and then lxc exec with commands that demonstrate the problem?