canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.37k stars 930 forks source link

Container loses static IPv6 default route (interference from SLAAC) #3582

Closed candlerb closed 7 years ago

candlerb commented 7 years ago

Required information

config:
  core.https_address: '[::]:8443'
  core.trust_password: true
api_extensions:
- id_map
api_status: stable
api_version: "1.0"
auth: trusted
public: false
environment:
  addresses:
  - 10.12.255.21:8443
  - '[XXXX:XXX:XXXX:XXXX::21]:8443'
  - 10.12.21.1:8443
  - '[YYYY:YYY:YYYY:21::1]:8443'
  architectures:
  - x86_64
  - i686
  certificate: <SNIP>
  certificate_fingerprint: <SNIP>
  driver: lxc
  driver_version: 2.0.8
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.4.0-83-generic
  server: lxd
  server_pid: 1402
  server_version: 2.0.10
  storage: btrfs
  storage_version: "4.4"

Issue description

A container is configured with a static IPv6 address and gateway:

iface eth0 inet6 static
address XXXX:XXX:XXXX:XXXX::32/64
gateway XXXX:XXX:XXXX:XXXX::1
privext 0
accept_ra 0
autoconf 0
dad-attempts 0

However after a period of time, the container drops its IPv6 gateway.

The container is bridged to the outside network where RAs and SLAAC are enabled; and as you can see, I have tried to disable its use in the container by setting accept_ra 0 and autoconf 0. However it seems that RAs are the source of the problem, because:

With RA lifetime of 30 minutes - when the container starts I see:

# ip -f inet6 route
XXXX:XXX:XXXX:XXXX::/64 dev eth0  proto kernel  metric 256  expires 2591998sec pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::66d1:54ff:fe5b:e2dd dev eth0  proto ra  metric 1024  expires 1798sec pref medium

(Note that the manually-configured gateway XXXX:XXX:XXXX:XXXX::1 is not present; the only gateway is the link-local address from proto ra)

After a short time, the "expires ...." part of the LAN route disappears, but not for the gateway:

# ip -f inet6 route
XXXX:XXX:XXXX:XXXX::/64 dev eth0  proto kernel  metric 256  pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::66d1:54ff:fe5b:e2dd dev eth0  proto ra  metric 1024  expires 1732sec pref medium

Then after 30 minutes I see:

# ip -f inet6 route
XXXX:XXX:XXXX:XXXX::/64 dev eth0  proto kernel  metric 256  pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium

... and the container loses its IPv6 connectivity to the outside world.

If I change the router to send RAs with 10 minute lifetime, then restart the container, I see:

# ip -6 route list
XXXX:XXX:XXXX:XXXX::/64 dev eth0  proto kernel  metric 256  pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::66d1:54ff:fe5b:e2dd dev eth0  proto ra  metric 1024  expires 597sec pref medium

As a related problem, I find that the container picks up a SLAAC IPv6 address in addition to its manually-assigned one, despite having set autoconf 0.

root@lxd1:~# lxc list ns-auth
+---------+---------+---------------------+----------------------------------------------+------------+-----------+
|  NAME   |  STATE  |        IPV4         |                     IPV6                     |    TYPE    | SNAPSHOTS |
+---------+---------+---------------------+----------------------------------------------+------------+-----------+
| ns-auth | RUNNING | 10.12.255.32 (eth0) | XXXX:XXX:XXXX:XXXX::32 (eth0)                | PERSISTENT | 0         |
|         |         |                     | XXXX:XXX:XXXX:XXXX:216:3eff:fe27:fea9 (eth0) |            |           |
+---------+---------+---------------------+----------------------------------------------+------------+-----------+

My guess as to what's happening is:

Note that this problem doesn't occur on the host, which is apparently configured in the same way (details below). But actually, the host sees both its static default route and the one from RAs:

root@lxd1:~# ip -6 route | tail -2
default via XXXX:XXX:XXXX:XXXX::1 dev br255  metric 1024  pref medium
default via fe80::66d1:54ff:fe5b:e2dd dev br255  proto ra  metric 1024  expires 540sec pref medium

Steps to reproduce

The host needs to be connected to a network with IPv6 and SLAAC.

On the host (ubuntu 16.04) I have:

auto eth0
iface eth0 inet manual

auto br255
iface br255 inet static
bridge_ports eth0
bridge_stp off
bridge_fd 0
bridge_maxwait 0
address 10.12.255.21/24
gateway 10.12.255.1
dns-nameservers 10.12.255.1
dns-search home.example.net

auto br255
iface br255 inet6 static
address XXXX:XXX:XXXX:XXXX::21/64
gateway XXXX:XXX:XXXX:XXXX::1
privext 0
accept_ra 0
autoconf 0
dad-attempts 0

The container was launched with -p br255 where this profile is:

config: {}
description: ""
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br255
    type: nic
name: br255

In the container (ubuntu:16.04) I have:

auto eth0
iface eth0 inet static
address 10.12.255.32/24
gateway 10.12.255.1
dns-nameservers 10.12.255.1
dns-search home.example.net

iface eth0 inet6 static
address XXXX:XXX:XXXX:XXXX::32/64
gateway XXXX:XXX:XXXX:XXXX::1
privext 0
accept_ra 0
autoconf 0
dad-attempts 0

There is no firewalling on the host, apart from the ACCEPT rules added by lxd itself:

# ufw status
Status: inactive

# ip6tables -L -n -v
Chain INPUT (policy ACCEPT 130K packets, 211M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     tcp      lxdbr0 *       ::/0                 ::/0                 tcp dpt:53 /* managed by lxd-bridge */
    0     0 ACCEPT     udp      lxdbr0 *       ::/0                 ::/0                 udp dpt:53 /* managed by lxd-bridge */
    0     0 ACCEPT     udp      lxdbr0 *       ::/0                 ::/0                 udp dpt:67 /* managed by lxd-bridge */

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all      *      lxdbr0  ::/0                 ::/0                 /* managed by lxd-bridge */
    0     0 ACCEPT     all      lxdbr0 *       ::/0                 ::/0                 /* managed by lxd-bridge */

Chain OUTPUT (policy ACCEPT 70001 packets, 5275K bytes)
 pkts bytes target     prot opt in     out     source               destination

The host does have IPv6 forwarding enabled, and in /etc/default/lxd-bridge I have a separate routed subnet for the lxdbr0 bridge:

# IPv6
## IPv6 address (e.g. 2001:470:b368:4242::1)
LXD_IPV6_ADDR="YYYY:YYY:YYYY:21::1"

## IPv6 CIDR mask (e.g. 64)
LXD_IPV6_MASK="64"

## IPv6 network (e.g. 2001:470:b368:4242::/64)
LXD_IPV6_NETWORK="YYYY:YYY:YYYY:21::/64"

## NAT IPv6 traffic
LXD_IPV6_NAT="false"

# Run a minimal HTTP PROXY server
LXD_IPV6_PROXY="false"

Although the affected container is not using lxdbr0, I think this may be related, because on the host I see accept_ra=2 on the bridge interface. Since I have not set this in sysctl.conf I suspect it was set by lxd.

root@lxd1:~# sysctl net.ipv6.conf.br255
net.ipv6.conf.br255.accept_dad = 1
net.ipv6.conf.br255.accept_ra = 2
net.ipv6.conf.br255.accept_ra_defrtr = 1
net.ipv6.conf.br255.accept_ra_from_local = 0
net.ipv6.conf.br255.accept_ra_min_hop_limit = 1
net.ipv6.conf.br255.accept_ra_mtu = 1
net.ipv6.conf.br255.accept_ra_pinfo = 1
net.ipv6.conf.br255.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.br255.accept_ra_rtr_pref = 1
net.ipv6.conf.br255.accept_redirects = 1
net.ipv6.conf.br255.accept_source_route = 0
net.ipv6.conf.br255.autoconf = 0
net.ipv6.conf.br255.dad_transmits = 1
net.ipv6.conf.br255.disable_ipv6 = 0
net.ipv6.conf.br255.force_mld_version = 0
net.ipv6.conf.br255.force_tllao = 0
net.ipv6.conf.br255.forwarding = 1
net.ipv6.conf.br255.hop_limit = 64
net.ipv6.conf.br255.ignore_routes_with_linkdown = 0
net.ipv6.conf.br255.max_addresses = 16
net.ipv6.conf.br255.max_desync_factor = 600
net.ipv6.conf.br255.mc_forwarding = 0
net.ipv6.conf.br255.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.br255.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.br255.mtu = 1500
net.ipv6.conf.br255.ndisc_notify = 0
net.ipv6.conf.br255.proxy_ndp = 0
net.ipv6.conf.br255.regen_max_retry = 3
net.ipv6.conf.br255.router_probe_interval = 60
net.ipv6.conf.br255.router_solicitation_delay = 1
net.ipv6.conf.br255.router_solicitation_interval = 4
net.ipv6.conf.br255.router_solicitations = 3
sysctl: reading key "net.ipv6.conf.br255.stable_secret"
net.ipv6.conf.br255.suppress_frag_ndisc = 1
net.ipv6.conf.br255.temp_prefered_lft = 86400
net.ipv6.conf.br255.temp_valid_lft = 604800
net.ipv6.conf.br255.use_oif_addrs_only = 0
net.ipv6.conf.br255.use_tempaddr = 0

Other things tried

I tried setting on the host:

root@lxd1:~# sysctl net.ipv6.conf.default.autoconf=0
root@lxd1:~# sysctl net.ipv6.conf.default.accept_ra=0

and then stopping and starting the container. This makes no difference: the container still gets a SLAAC address in addition to its static one, and the default gateway is still ticking away to expiry.

I tried commenting out all the extra settings inside the container:

#privext 0
#accept_ra 0
#autoconf 0
#dad-attempts 0

Again no difference, and indeed eth0.accept_ra was zero, although all.accept_ra and default.accept_are were both set to one.

root@ns-auth:~# sysctl -a | grep accept_ra
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_symlinks'
sysctl: permission denied on key 'kernel.cad_pid'
sysctl: permission denied on key 'kernel.unprivileged_userns_apparmor_policy'
sysctl: permission denied on key 'kernel.usermodehelper.bset'
sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
sysctl: reading key "net.ipv6.conf.all.stable_secret"
net.ipv6.conf.all.accept_ra = 1
net.ipv6.conf.all.accept_ra_defrtr = 1
net.ipv6.conf.all.accept_ra_from_local = 0
net.ipv6.conf.all.accept_ra_min_hop_limit = 1
net.ipv6.conf.all.accept_ra_mtu = 1
net.ipv6.conf.all.accept_ra_pinfo = 1
net.ipv6.conf.all.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.all.accept_ra_rtr_pref = 1
sysctl: reading key "net.ipv6.conf.default.stable_secret"
net.ipv6.conf.default.accept_ra = 1
net.ipv6.conf.default.accept_ra_defrtr = 1
net.ipv6.conf.default.accept_ra_from_local = 0
net.ipv6.conf.default.accept_ra_min_hop_limit = 1
net.ipv6.conf.default.accept_ra_mtu = 1
net.ipv6.conf.default.accept_ra_pinfo = 1
net.ipv6.conf.default.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.default.accept_ra_rtr_pref = 1
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
net.ipv6.conf.eth0.accept_ra = 0
net.ipv6.conf.eth0.accept_ra_defrtr = 1
net.ipv6.conf.eth0.accept_ra_from_local = 0
net.ipv6.conf.eth0.accept_ra_min_hop_limit = 1
net.ipv6.conf.eth0.accept_ra_mtu = 1
net.ipv6.conf.eth0.accept_ra_pinfo = 1
net.ipv6.conf.eth0.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.eth0.accept_ra_rtr_pref = 1
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
net.ipv6.conf.lo.accept_ra = 1
net.ipv6.conf.lo.accept_ra_defrtr = 1
net.ipv6.conf.lo.accept_ra_from_local = 0
net.ipv6.conf.lo.accept_ra_min_hop_limit = 1
net.ipv6.conf.lo.accept_ra_mtu = 1
net.ipv6.conf.lo.accept_ra_pinfo = 1
net.ipv6.conf.lo.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.lo.accept_ra_rtr_pref = 1

So finally, I tried explicitly enabling acceptance of RAs:

privext 0
accept_ra 1
autoconf 0
dad-attempts 0

This did change the sysctl setting inside the container (net.ipv6.conf.eth0.accept_ra = 1), and now the default gateway is refreshed via RAs:

root@ns-auth:~# ip -6 route
XXXX:XXX:XXXX:XXXX::/64 dev eth0  proto kernel  metric 256  expires 2591891sec pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::66d1:54ff:fe5b:e2dd dev eth0  proto ra  metric 1024  expires 491sec pref medium
root@ns-auth:~# ip -6 route
XXXX:XXX:XXXX:XXXX::/64 dev eth0  proto kernel  metric 256  expires 2591998sec pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::66d1:54ff:fe5b:e2dd dev eth0  proto ra  metric 1024  expires 598sec pref medium

So this is a usable workaround, because the container doesn't lose connectivity. However I wanted to set a static gateway and not be reliant on RAs.

stgraber commented 7 years ago

This is pretty odd. It's very unlikely to be a LXD issue in itself but I'll try to reproduce it and poke around to see what's going on exactly. You certainly shouldn't be getting an RA provided route when you have a static entry and disabled RAs in your container.

stgraber commented 7 years ago

So I appear unable to reproduce this here with a basic Ubuntu 16.04 container.

/etc/network/interfaces:

root@test:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
    address 10.204.119.10
    netmask 255.255.255.0
    gateway 10.204.119.1
    dns-nameservers 10.204.119.1

iface eth0 inet6 static
    address 2001:470:b368:4242::32/64
    gateway 2001:470:b368:4242::1

Routes:

root@test:~# ip -6 route show
2001:470:b368:4242::/64 dev eth0  proto kernel  metric 256  pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via 2001:470:b368:4242::1 dev eth0  metric 1024  pref medium

Looks like the above config did cause the usual sysctls to be set to sane values:

root@test:~# sysctl net.ipv6.conf.eth0.accept_ra
net.ipv6.conf.eth0.accept_ra = 0
root@test:~# sysctl net.ipv6.conf.eth0.autoconf
net.ipv6.conf.eth0.autoconf = 0
stgraber commented 7 years ago

For comparison, a container without the /etc/network/interfaces modification on the same host gets this:

root@test1:~# ip -6 route show
2001:470:b368:4242::/64 dev eth0  proto kernel  metric 256  expires 3592sec pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::ccfd:30ff:fe75:1546 dev eth0  proto ra  metric 1024  expires 1792sec hoplimit 64 pref medium
root@test1:~# sysctl net.ipv6.conf.eth0.accept_ra
net.ipv6.conf.eth0.accept_ra = 1
root@test1:~# sysctl net.ipv6.conf.eth0.autoconf
net.ipv6.conf.eth0.autoconf = 1
stgraber commented 7 years ago

Marking the issue as Incomplete as I'm unable to reproduce the issue. I'll keep the container running for a few hours and see if maybe further RA cause it to reconfigure its default gateway.

candlerb commented 7 years ago

OK, I'll try to make a standalone test case with a local bridge and radvd. I suspect there is some sort of race involved.

candlerb commented 7 years ago

I can replicate locally as follows. This is on a low power machine (actually an ubuntu 16.04 VM, running inside a NUC DN2820 which is also ubuntu 16.04)

If the following steps don't work for you, I'll try to replicate in a t2.nano in EC2.

# On the host
brctl addbr brtest
ifconfig brtest up
ip -6 addr add 2001:db8:0:1::1/64 dev brtest
lxc profile create brtest
lxc profile edit brtest

---
config: {}
description: ""
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: brtest
    type: nic
name: brtest
---

apt-get install radvd
vi /etc/radvd.conf

---
interface brtest
{
        AdvSendAdvert on;
        MaxRtrAdvInterval 30;
        prefix 2001:db8:0:1::/64
        {
                AdvOnLink on;
                AdvAutonomous on;
        };
};
---

systemctl start radvd
lxc launch ubuntu:16.04 -p brtest test1
lxc exec test1 bash

# Now we're inside the container; use "ifconfig" to check
# a SLAAC address is obtained, e.g. 2001:db8:0:1:216:3eff:fe15:f2ca/64
# check that SLAAC address is obtained

vi /etc/network/interfaces.d/50-cloud-init.cfg

---
auto eth0
iface eth0 inet manual

iface eth0 inet6 static
address 2001:db8:0:1::1000/64
gateway 2001:db8:0:1::1
accept_ra 0
autoconf 0
privext 0
dad-attempts 0
---

exit

# Now we are back on the host
lxc stop test1
lxc start test1
lxc exec test1 bash

# Back inside the container:
# ip -6 route
2001:db8:0:1::/64 dev eth0  proto kernel  metric 256  pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::fc22:cff:fe72:7b93 dev eth0  proto ra  metric 1024  expires 53sec hoplimit 64 pref medium

# ip -6 addr show dev eth0
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2001:db8:0:1::1000/64 scope global
       valid_lft forever preferred_lft forever
    inet6 2001:db8:0:1:216:3eff:fe15:f2ca/64 scope global mngtmpaddr dynamic
       valid_lft 86094sec preferred_lft 14094sec
    inet6 fe80::216:3eff:fe15:f2ca/64 scope link
       valid_lft forever preferred_lft forever
candlerb commented 7 years ago

This very much appears to be a race condition. I can reproduce in EC2 on a t2.nano instance using the following script, but inside the container I need either to uncomment the post-up sleep 2 line, or change inet static to inet dhcp. My underpowered home server doesn't need this.

#!/bin/sh -ex
# Reproducer for lxc/lxd #3582
# Run this script as root on a clean 16.04 VM

#### networking setup
apt-get -y install bridge-utils
cat <<EOS >/etc/network/interfaces.d/brtest.cfg
auto brtest
iface brtest inet static
  address        192.0.2.1/24
  bridge_ports   none
  bridge_stp     off
  bridge_fd      0
  bridge_maxwait 0

iface brtest inet6 static
  address        2001:db8:0:1::1/64
  autoconf       0
  privext        0
  accept_ra      0
  dad-attempts   0
EOS
ifup brtest

cat >/etc/sysctl.d/99-sysctl.conf <<EOS
# radvd won't start unless this is set
net.ipv6.conf.all.forwarding=1
EOS
sysctl -p /etc/sysctl.d/99-sysctl.conf

apt-get -y install radvd
cat <<EOS >/etc/radvd.conf
interface brtest
{
        AdvSendAdvert on;
        MaxRtrAdvInterval 30;
        prefix 2001:db8:0:1::/64
        {
                AdvOnLink on;
                AdvAutonomous on;
        };
};
EOS
systemctl stop radvd
systemctl start radvd

#### lxd setup
lxd init --auto --storage-backend=dir
lxc profile create brtest
lxc profile device add brtest eth0 nic name=eth0 nictype=bridged parent=brtest

### container setup
lxc launch ubuntu:16.04 -p brtest testv6
lxc file push - testv6/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg <<EOS
network: {config: disabled}
EOS
lxc file push - testv6/etc/network/interfaces.d/50-cloud-init.cfg <<EOS
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
  address       192.0.2.100/24
  gateway       192.0.2.1
  #post-up sleep 2

#iface eth0 inet dhcp

iface eth0 inet6 static
  address       2001:db8:0:1::100/64
  gateway       2001:db8:0:1::1
  accept_ra     0
  autoconf      0
  privext       0
  dad-attempts  0
EOS
lxc restart testv6

### Show the problems inside the container
sleep 5
echo "--- Container IPv6 routes ---"
lxc exec testv6 -- /sbin/ip -6 route
echo "--- Container IPv6 addresses ---"
lxc exec testv6 -- ip -6 addr show dev eth0

# 1. The static default gateway to 2001:db8:0:1::1 is missing
# 2. There is a "proto ra" default gateway to fe80::..., despite "accept_ra 0"
# 3. This RA default gateway expires after a few minutes, and since there
#    is no static gateway, the container loses its IPv6 connectivity
#    (this is the main issue)
# 4. eth0 has picked up a SLAAC address in addition to its static address,
#    despite "autoconf 0"

# IF THIS DOES NOT WORK:
# 1. Check that radvd is running.  Often it seems not to start for some reason.
# 2. Inside the container, run "tcpdump -i eth0 -nn icmp6" and check that RAs
#    are being received. (May not be shown until you hit ^C)
# 3. Try uncommenting "post-up sleep 2", or changing inet static to dhcp
#    (both of these slow down the container startup)

@stgraber: If you want access to the EC2 instance, please post or E-mail me your SSH public key.

stgraber commented 7 years ago

Ok, so sounds like it's a race between the kernel and ifupdown. If the kernel gets the RA before ifupdown is run, then you end up with the default gateway configured by the RA rather than the static configuration you entered...

I wonder if putting:

pre-up ip -6 route flush dev eth0

In the inet6 section would fix the race.

candlerb commented 7 years ago

I wonder if putting:

pre-up ip -6 route flush dev eth0

In the inet6 section would fix the race.

Interesting idea. I have tested it on my slow box, and what I find is that:

  1. The container loses the /64 LAN route as well, so now it has only the RA default route and nothing else

    root@apt-cacher:~# ip -6 route
    default via fe80::66d1:54ff:fe5b:e2dd dev eth0  proto ra  metric 1024  expires 594sec pref medium
  2. The container gets only a SLAAC address, and loses its static IP address

    root@apt-cacher:~# ip -6 addr show dev eth0
    25: eth0@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 XXXX:XXX:XXXX:XXXX:216:3eff:feaa:752d/64 scope global mngtmpaddr dynamic
           valid_lft 2591986sec preferred_lft 604786sec
        inet6 fe80::216:3eff:feaa:752d/64 scope link
           valid_lft forever preferred_lft forever

What I should probably do is test this with a "full fat" VM with sleep 2. If it replicates that way, then it would suggest the race is inherent to debian/ubuntu startup scripts, and not related to running in lxd.

stgraber commented 7 years ago

Well, LXD doesn't configure any of that kernel stuff, so we're really debating whether it's a kernel bug or an ifupdown bug at this point :)

candlerb commented 7 years ago

I have added post-up sleep 2 under the inet section to a full-fat VM (actually the VM in which the lxd containers are running) and rebooted it. I find the same problem:

Arguably several things at play here.

I'm not sure where best to raise this though. Ubuntu launchpad? Debian?

stgraber commented 7 years ago

https://launchpad.net/ubuntu/+source/ifupdown/+filebug for Ubuntu. Though the problem is likely to be the same in Debian so maybe going upstream makes sense here.

As a workaround, you could put a pre-up in the ipv4 section which sets the sysctl keys directly. This should have them all set properly prior to the device coming up.

candlerb commented 7 years ago

As a workaround, you could put a pre-up in the ipv4 section which sets the sysctl keys directly

Great idea... but I just tried it and it doesn't work.

A thought. Is it possible that with lxd at least, the eth0 interface is already 'up' before ifupdown gets a look-in?

If I put in the inet section

pre-up /var/tmp/fixit.sh

and this script contains

#!/bin/sh -xe
exec >/tmp/fixit.log 2>&1
ifconfig eth0
sysctl net.ipv6.conf.eth0.accept_ra

then the result is:

+ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:16:3e:aa:75:2d
          inet6 addr: XXXX:XXX:XXXX:XXXX:216:3eff:feaa:752d/64 Scope:Global
          inet6 addr: fe80::216:3eff:feaa:752d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1098 (1.0 KB)  TX bytes:586 (586.0 B)

+ sysctl net.ipv6.conf.eth0.accept_ra
net.ipv6.conf.eth0.accept_ra = 1

Notice how the interface is already UP by the time pre-up is called. (And even has its SLAAC address already).

stgraber commented 7 years ago

Ah, yeah, that'd make sense. I believe those veth pair are already up by the time the container starts up.

So maybe you need to bring it down and back up too to avoid the race? :)

candlerb commented 7 years ago

OK, that works: if I run as a pre-up script

ifconfig eth0 down
sysctl net.ipv6.conf.eth0.accept_ra=0
sysctl net.ipv6.conf.eth0.autoconf=0

then everything is fine. What a palaver. (Aside: whose idea was it to put IPv6 address autoconfiguration into the kernel anyway?? :-)

BTW, I did also try setting net.ipv6.conf.default.autoconf=0 and net.ipv6.conf.default.accept_ra=0 on the host, hoping that when the veth/eth pair was created, they would inherit these settings. But they don't. In fact, if I look inside the container, I see:

root@apt-cacher:~# sysctl net.ipv6.conf.default.accept_ra
net.ipv6.conf.default.accept_ra = 1
root@apt-cacher:~# sysctl net.ipv6.conf.default.autoconf
net.ipv6.conf.default.autoconf = 1

But outside, on the host:

root@apt-cacher:~# exit
root@lxd1:~# sysctl net.ipv6.conf.default.accept_ra
net.ipv6.conf.default.accept_ra = 0
root@lxd1:~# sysctl net.ipv6.conf.default.autoconf
net.ipv6.conf.default.autoconf = 0

So even the concept of default settings, to be inherited by newly created interfaces, is virtualised inside the container - and defaults to the autoconf nonsense turned on. We need a tunable default for the default! :-)

Back to lxd. I see this:

func (c *containerLXC) createNetworkDevice...
...
        // Handle bridged and p2p
        if shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) {
                n2 := deviceNextVeth()

                _, err := shared.RunCommand("ip", "link", "add", n1, "type", "veth", "peer", "name", n2)
...
                _, err = shared.RunCommand("ip", "link", "set", n1, "up")
...
                dev = n2
...
        // Bring the interface up
        _, err := shared.RunCommand("ip", "link", "set", "dev", dev, "up")

So it appears that lxd is indeed bringing up the container-side eth, as well as the veth side which attaches to the host bridge. I guess there must be a reason why it doesn't leave it for the OS inside the container to bring up the interface?

stgraber commented 7 years ago

Yeah, that code is for device hotplug which is a bit different than what you get on a fresh container start where liblxc does effectively the same for you.

The reason why liblxc does it is because it also allows pre-configuring a number of device settings which cannot be set if the device isn't up. Some device types like macvlan also need to be brought up to have their MAC address properly registered in the NIC, leaving it to the container's init system to bring it up would hide a number of critical errors.

There also are a number of distributions that were expecting devices to be brought up because their initrd was usually doing that. I think most of those no longer do this, but it certainly used to be a thing...

candlerb commented 7 years ago

FYI, I caught a container "in the act". After adding this workaround to the inet section in the container config:

pre-up ip link set dev eth0 down
pre-up sysctl net.ipv6.conf.eth0.autoconf=0
pre-up sysctl net.ipv6.conf.eth0.accept_ra=0

Then restarting the container, and doing "lxc list" repeatedly:

root@lxd1:~# lxc list
+------------+---------+---------------------+-------------------------------+------------+-----------+
|    NAME    |  STATE  |        IPV4         |             IPV6              |    TYPE    | SNAPSHOTS |
+------------+---------+---------------------+-------------------------------+------------+-----------+
| apt-cacher | RUNNING |                     |                               | PERSISTENT | 0         |
+------------+---------+---------------------+-------------------------------+------------+-----------+
root@lxd1:~# lxc list
+------------+---------+---------------------+----------------------------------------------+------------+-----------+
|    NAME    |  STATE  |        IPV4         |                     IPV6                     |    TYPE    | SNAPSHOTS |
+------------+---------+---------------------+----------------------------------------------+------------+-----------+
| apt-cacher | RUNNING |                     | XXXX:XXX:XXXX:XXXX:216:3eff:feaa:752d (eth0) | PERSISTENT | 0         |
+------------+---------+---------------------+----------------------------------------------+------------+-----------+
root@lxd1:~# lxc list
+------------+---------+---------------------+-------------------------------+------------+-----------+
|    NAME    |  STATE  |        IPV4         |             IPV6              |    TYPE    | SNAPSHOTS |
+------------+---------+---------------------+-------------------------------+------------+-----------+
| apt-cacher | RUNNING | 10.12.255.31 (eth0) |                               | PERSISTENT | 0         |
+------------+---------+---------------------+-------------------------------+------------+-----------+
root@lxd1:~# lxc list
+------------+---------+---------------------+-------------------------------+------------+-----------+
|    NAME    |  STATE  |        IPV4         |             IPV6              |    TYPE    | SNAPSHOTS |
+------------+---------+---------------------+-------------------------------+------------+-----------+
| apt-cacher | RUNNING | 10.12.255.31 (eth0) | XXXX:XXX:XXXX:XXXX::31 (eth0) | PERSISTENT | 0         |
+------------+---------+---------------------+-------------------------------+------------+-----------+
stgraber commented 7 years ago

Just went through this again and I'm really not seeing anything that LXD itself is doing wrong.

I'd recommend filing a bug against ifupdown. It may then be further moved to the kernel, but it feels to me as if ifupdown should really be able to do what you ask it to do in this case.

I'm closing the LXD issue as there doesn't appear anything for us to do here. Please let us know of any bug report you file here, so that we can subscribe to it.

aj-gh commented 2 years ago

FYI workarounded this using: pre-up ip -6 route del default dev $IFACE || true As the default gateway is already there adding the correct (static) one fails - which also prevents post-up commands (and similar) from triggering.