Closed candlerb closed 7 years ago
This is pretty odd. It's very unlikely to be a LXD issue in itself but I'll try to reproduce it and poke around to see what's going on exactly. You certainly shouldn't be getting an RA provided route when you have a static entry and disabled RAs in your container.
So I appear unable to reproduce this here with a basic Ubuntu 16.04 container.
/etc/network/interfaces:
root@test:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 10.204.119.10
netmask 255.255.255.0
gateway 10.204.119.1
dns-nameservers 10.204.119.1
iface eth0 inet6 static
address 2001:470:b368:4242::32/64
gateway 2001:470:b368:4242::1
Routes:
root@test:~# ip -6 route show
2001:470:b368:4242::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via 2001:470:b368:4242::1 dev eth0 metric 1024 pref medium
Looks like the above config did cause the usual sysctls to be set to sane values:
root@test:~# sysctl net.ipv6.conf.eth0.accept_ra
net.ipv6.conf.eth0.accept_ra = 0
root@test:~# sysctl net.ipv6.conf.eth0.autoconf
net.ipv6.conf.eth0.autoconf = 0
For comparison, a container without the /etc/network/interfaces modification on the same host gets this:
root@test1:~# ip -6 route show
2001:470:b368:4242::/64 dev eth0 proto kernel metric 256 expires 3592sec pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::ccfd:30ff:fe75:1546 dev eth0 proto ra metric 1024 expires 1792sec hoplimit 64 pref medium
root@test1:~# sysctl net.ipv6.conf.eth0.accept_ra
net.ipv6.conf.eth0.accept_ra = 1
root@test1:~# sysctl net.ipv6.conf.eth0.autoconf
net.ipv6.conf.eth0.autoconf = 1
Marking the issue as Incomplete as I'm unable to reproduce the issue. I'll keep the container running for a few hours and see if maybe further RA cause it to reconfigure its default gateway.
OK, I'll try to make a standalone test case with a local bridge and radvd. I suspect there is some sort of race involved.
I can replicate locally as follows. This is on a low power machine (actually an ubuntu 16.04 VM, running inside a NUC DN2820 which is also ubuntu 16.04)
If the following steps don't work for you, I'll try to replicate in a t2.nano in EC2.
# On the host
brctl addbr brtest
ifconfig brtest up
ip -6 addr add 2001:db8:0:1::1/64 dev brtest
lxc profile create brtest
lxc profile edit brtest
---
config: {}
description: ""
devices:
eth0:
name: eth0
nictype: bridged
parent: brtest
type: nic
name: brtest
---
apt-get install radvd
vi /etc/radvd.conf
---
interface brtest
{
AdvSendAdvert on;
MaxRtrAdvInterval 30;
prefix 2001:db8:0:1::/64
{
AdvOnLink on;
AdvAutonomous on;
};
};
---
systemctl start radvd
lxc launch ubuntu:16.04 -p brtest test1
lxc exec test1 bash
# Now we're inside the container; use "ifconfig" to check
# a SLAAC address is obtained, e.g. 2001:db8:0:1:216:3eff:fe15:f2ca/64
# check that SLAAC address is obtained
vi /etc/network/interfaces.d/50-cloud-init.cfg
---
auto eth0
iface eth0 inet manual
iface eth0 inet6 static
address 2001:db8:0:1::1000/64
gateway 2001:db8:0:1::1
accept_ra 0
autoconf 0
privext 0
dad-attempts 0
---
exit
# Now we are back on the host
lxc stop test1
lxc start test1
lxc exec test1 bash
# Back inside the container:
# ip -6 route
2001:db8:0:1::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::fc22:cff:fe72:7b93 dev eth0 proto ra metric 1024 expires 53sec hoplimit 64 pref medium
# ip -6 addr show dev eth0
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2001:db8:0:1::1000/64 scope global
valid_lft forever preferred_lft forever
inet6 2001:db8:0:1:216:3eff:fe15:f2ca/64 scope global mngtmpaddr dynamic
valid_lft 86094sec preferred_lft 14094sec
inet6 fe80::216:3eff:fe15:f2ca/64 scope link
valid_lft forever preferred_lft forever
expires -29sec
the default gw vanishes entirely.This very much appears to be a race condition. I can reproduce in EC2 on a t2.nano instance using the following script, but inside the container I need either to uncomment the post-up sleep 2
line, or change inet static
to inet dhcp
. My underpowered home server doesn't need this.
#!/bin/sh -ex
# Reproducer for lxc/lxd #3582
# Run this script as root on a clean 16.04 VM
#### networking setup
apt-get -y install bridge-utils
cat <<EOS >/etc/network/interfaces.d/brtest.cfg
auto brtest
iface brtest inet static
address 192.0.2.1/24
bridge_ports none
bridge_stp off
bridge_fd 0
bridge_maxwait 0
iface brtest inet6 static
address 2001:db8:0:1::1/64
autoconf 0
privext 0
accept_ra 0
dad-attempts 0
EOS
ifup brtest
cat >/etc/sysctl.d/99-sysctl.conf <<EOS
# radvd won't start unless this is set
net.ipv6.conf.all.forwarding=1
EOS
sysctl -p /etc/sysctl.d/99-sysctl.conf
apt-get -y install radvd
cat <<EOS >/etc/radvd.conf
interface brtest
{
AdvSendAdvert on;
MaxRtrAdvInterval 30;
prefix 2001:db8:0:1::/64
{
AdvOnLink on;
AdvAutonomous on;
};
};
EOS
systemctl stop radvd
systemctl start radvd
#### lxd setup
lxd init --auto --storage-backend=dir
lxc profile create brtest
lxc profile device add brtest eth0 nic name=eth0 nictype=bridged parent=brtest
### container setup
lxc launch ubuntu:16.04 -p brtest testv6
lxc file push - testv6/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg <<EOS
network: {config: disabled}
EOS
lxc file push - testv6/etc/network/interfaces.d/50-cloud-init.cfg <<EOS
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 192.0.2.100/24
gateway 192.0.2.1
#post-up sleep 2
#iface eth0 inet dhcp
iface eth0 inet6 static
address 2001:db8:0:1::100/64
gateway 2001:db8:0:1::1
accept_ra 0
autoconf 0
privext 0
dad-attempts 0
EOS
lxc restart testv6
### Show the problems inside the container
sleep 5
echo "--- Container IPv6 routes ---"
lxc exec testv6 -- /sbin/ip -6 route
echo "--- Container IPv6 addresses ---"
lxc exec testv6 -- ip -6 addr show dev eth0
# 1. The static default gateway to 2001:db8:0:1::1 is missing
# 2. There is a "proto ra" default gateway to fe80::..., despite "accept_ra 0"
# 3. This RA default gateway expires after a few minutes, and since there
# is no static gateway, the container loses its IPv6 connectivity
# (this is the main issue)
# 4. eth0 has picked up a SLAAC address in addition to its static address,
# despite "autoconf 0"
# IF THIS DOES NOT WORK:
# 1. Check that radvd is running. Often it seems not to start for some reason.
# 2. Inside the container, run "tcpdump -i eth0 -nn icmp6" and check that RAs
# are being received. (May not be shown until you hit ^C)
# 3. Try uncommenting "post-up sleep 2", or changing inet static to dhcp
# (both of these slow down the container startup)
@stgraber: If you want access to the EC2 instance, please post or E-mail me your SSH public key.
Ok, so sounds like it's a race between the kernel and ifupdown. If the kernel gets the RA before ifupdown is run, then you end up with the default gateway configured by the RA rather than the static configuration you entered...
I wonder if putting:
pre-up ip -6 route flush dev eth0
In the inet6 section would fix the race.
I wonder if putting:
pre-up ip -6 route flush dev eth0
In the inet6 section would fix the race.
Interesting idea. I have tested it on my slow box, and what I find is that:
The container loses the /64 LAN route as well, so now it has only the RA default route and nothing else
root@apt-cacher:~# ip -6 route
default via fe80::66d1:54ff:fe5b:e2dd dev eth0 proto ra metric 1024 expires 594sec pref medium
The container gets only a SLAAC address, and loses its static IP address
root@apt-cacher:~# ip -6 addr show dev eth0
25: eth0@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 XXXX:XXX:XXXX:XXXX:216:3eff:feaa:752d/64 scope global mngtmpaddr dynamic
valid_lft 2591986sec preferred_lft 604786sec
inet6 fe80::216:3eff:feaa:752d/64 scope link
valid_lft forever preferred_lft forever
What I should probably do is test this with a "full fat" VM with sleep 2. If it replicates that way, then it would suggest the race is inherent to debian/ubuntu startup scripts, and not related to running in lxd.
Well, LXD doesn't configure any of that kernel stuff, so we're really debating whether it's a kernel bug or an ifupdown bug at this point :)
I have added post-up sleep 2
under the inet section to a full-fat VM (actually the VM in which the lxd containers are running) and rebooted it. I find the same problem:
Arguably several things at play here.
default.accept_ra=1
and default.autoconf=1
I'm not sure where best to raise this though. Ubuntu launchpad? Debian?
https://launchpad.net/ubuntu/+source/ifupdown/+filebug for Ubuntu. Though the problem is likely to be the same in Debian so maybe going upstream makes sense here.
As a workaround, you could put a pre-up in the ipv4 section which sets the sysctl keys directly. This should have them all set properly prior to the device coming up.
As a workaround, you could put a pre-up in the ipv4 section which sets the sysctl keys directly
Great idea... but I just tried it and it doesn't work.
A thought. Is it possible that with lxd at least, the eth0 interface is already 'up' before ifupdown gets a look-in?
If I put in the inet section
pre-up /var/tmp/fixit.sh
and this script contains
#!/bin/sh -xe
exec >/tmp/fixit.log 2>&1
ifconfig eth0
sysctl net.ipv6.conf.eth0.accept_ra
then the result is:
+ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:16:3e:aa:75:2d
inet6 addr: XXXX:XXX:XXXX:XXXX:216:3eff:feaa:752d/64 Scope:Global
inet6 addr: fe80::216:3eff:feaa:752d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1098 (1.0 KB) TX bytes:586 (586.0 B)
+ sysctl net.ipv6.conf.eth0.accept_ra
net.ipv6.conf.eth0.accept_ra = 1
Notice how the interface is already UP by the time pre-up
is called. (And even has its SLAAC address already).
Ah, yeah, that'd make sense. I believe those veth pair are already up by the time the container starts up.
So maybe you need to bring it down and back up too to avoid the race? :)
OK, that works: if I run as a pre-up script
ifconfig eth0 down
sysctl net.ipv6.conf.eth0.accept_ra=0
sysctl net.ipv6.conf.eth0.autoconf=0
then everything is fine. What a palaver. (Aside: whose idea was it to put IPv6 address autoconfiguration into the kernel anyway?? :-)
BTW, I did also try setting net.ipv6.conf.default.autoconf=0
and net.ipv6.conf.default.accept_ra=0
on the host, hoping that when the veth/eth pair was created, they would inherit these settings. But they don't. In fact, if I look inside the container, I see:
root@apt-cacher:~# sysctl net.ipv6.conf.default.accept_ra
net.ipv6.conf.default.accept_ra = 1
root@apt-cacher:~# sysctl net.ipv6.conf.default.autoconf
net.ipv6.conf.default.autoconf = 1
But outside, on the host:
root@apt-cacher:~# exit
root@lxd1:~# sysctl net.ipv6.conf.default.accept_ra
net.ipv6.conf.default.accept_ra = 0
root@lxd1:~# sysctl net.ipv6.conf.default.autoconf
net.ipv6.conf.default.autoconf = 0
So even the concept of default
settings, to be inherited by newly created interfaces, is virtualised inside the container - and defaults to the autoconf nonsense turned on. We need a tunable default for the default! :-)
Back to lxd. I see this:
func (c *containerLXC) createNetworkDevice...
...
// Handle bridged and p2p
if shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) {
n2 := deviceNextVeth()
_, err := shared.RunCommand("ip", "link", "add", n1, "type", "veth", "peer", "name", n2)
...
_, err = shared.RunCommand("ip", "link", "set", n1, "up")
...
dev = n2
...
// Bring the interface up
_, err := shared.RunCommand("ip", "link", "set", "dev", dev, "up")
So it appears that lxd is indeed bringing up the container-side eth, as well as the veth side which attaches to the host bridge. I guess there must be a reason why it doesn't leave it for the OS inside the container to bring up the interface?
Yeah, that code is for device hotplug which is a bit different than what you get on a fresh container start where liblxc does effectively the same for you.
The reason why liblxc does it is because it also allows pre-configuring a number of device settings which cannot be set if the device isn't up. Some device types like macvlan also need to be brought up to have their MAC address properly registered in the NIC, leaving it to the container's init system to bring it up would hide a number of critical errors.
There also are a number of distributions that were expecting devices to be brought up because their initrd was usually doing that. I think most of those no longer do this, but it certainly used to be a thing...
FYI, I caught a container "in the act". After adding this workaround to the inet section in the container config:
pre-up ip link set dev eth0 down
pre-up sysctl net.ipv6.conf.eth0.autoconf=0
pre-up sysctl net.ipv6.conf.eth0.accept_ra=0
Then restarting the container, and doing "lxc list" repeatedly:
root@lxd1:~# lxc list
+------------+---------+---------------------+-------------------------------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------------+---------+---------------------+-------------------------------+------------+-----------+
| apt-cacher | RUNNING | | | PERSISTENT | 0 |
+------------+---------+---------------------+-------------------------------+------------+-----------+
root@lxd1:~# lxc list
+------------+---------+---------------------+----------------------------------------------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------------+---------+---------------------+----------------------------------------------+------------+-----------+
| apt-cacher | RUNNING | | XXXX:XXX:XXXX:XXXX:216:3eff:feaa:752d (eth0) | PERSISTENT | 0 |
+------------+---------+---------------------+----------------------------------------------+------------+-----------+
root@lxd1:~# lxc list
+------------+---------+---------------------+-------------------------------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------------+---------+---------------------+-------------------------------+------------+-----------+
| apt-cacher | RUNNING | 10.12.255.31 (eth0) | | PERSISTENT | 0 |
+------------+---------+---------------------+-------------------------------+------------+-----------+
root@lxd1:~# lxc list
+------------+---------+---------------------+-------------------------------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------------+---------+---------------------+-------------------------------+------------+-----------+
| apt-cacher | RUNNING | 10.12.255.31 (eth0) | XXXX:XXX:XXXX:XXXX::31 (eth0) | PERSISTENT | 0 |
+------------+---------+---------------------+-------------------------------+------------+-----------+
Just went through this again and I'm really not seeing anything that LXD itself is doing wrong.
I'd recommend filing a bug against ifupdown. It may then be further moved to the kernel, but it feels to me as if ifupdown should really be able to do what you ask it to do in this case.
I'm closing the LXD issue as there doesn't appear anything for us to do here. Please let us know of any bug report you file here, so that we can subscribe to it.
FYI workarounded this using:
pre-up ip -6 route del default dev $IFACE || true
As the default gateway is already there adding the correct (static) one fails - which also prevents post-up commands (and similar) from triggering.
Required information
Issue description
A container is configured with a static IPv6 address and gateway:
However after a period of time, the container drops its IPv6 gateway.
The container is bridged to the outside network where RAs and SLAAC are enabled; and as you can see, I have tried to disable its use in the container by setting
accept_ra 0
andautoconf 0
. However it seems that RAs are the source of the problem, because:With RA lifetime of 30 minutes - when the container starts I see:
(Note that the manually-configured gateway
XXXX:XXX:XXXX:XXXX::1
is not present; the only gateway is the link-local address fromproto ra
)After a short time, the "expires ...." part of the LAN route disappears, but not for the gateway:
Then after 30 minutes I see:
... and the container loses its IPv6 connectivity to the outside world.
If I change the router to send RAs with 10 minute lifetime, then restart the container, I see:
As a related problem, I find that the container picks up a SLAAC IPv6 address in addition to its manually-assigned one, despite having set
autoconf 0
.My guess as to what's happening is:
Note that this problem doesn't occur on the host, which is apparently configured in the same way (details below). But actually, the host sees both its static default route and the one from RAs:
Steps to reproduce
The host needs to be connected to a network with IPv6 and SLAAC.
On the host (ubuntu 16.04) I have:
The container was launched with
-p br255
where this profile is:In the container (ubuntu:16.04) I have:
There is no firewalling on the host, apart from the ACCEPT rules added by lxd itself:
The host does have IPv6 forwarding enabled, and in
/etc/default/lxd-bridge
I have a separate routed subnet for thelxdbr0
bridge:Although the affected container is not using lxdbr0, I think this may be related, because on the host I see
accept_ra=2
on the bridge interface. Since I have not set this insysctl.conf
I suspect it was set by lxd.Other things tried
I tried setting on the host:
and then stopping and starting the container. This makes no difference: the container still gets a SLAAC address in addition to its static one, and the default gateway is still ticking away to expiry.
I tried commenting out all the extra settings inside the container:
Again no difference, and indeed
eth0.accept_ra
was zero, althoughall.accept_ra
anddefault.accept_are
were both set to one.So finally, I tried explicitly enabling acceptance of RAs:
This did change the sysctl setting inside the container (
net.ipv6.conf.eth0.accept_ra = 1
), and now the default gateway is refreshed via RAs:So this is a usable workaround, because the container doesn't lose connectivity. However I wanted to set a static gateway and not be reliant on RAs.