Closed xvd7 closed 3 years ago
@tomponline any suggestions?
I suspect you'd want to check what's on disk in /var/snap/lxd/common/lxd/networks
as that's what dnsmasq consumes.
Is dnsmasq running?
root@server /var/snap/lxd/common/lxd/networks # ls -la
total 12
drwx--x--x 3 root root 4096 Dec 9 20:36 .
drwx--x--x 17 root root 4096 Dec 11 19:49 ..
drwx--x--x 3 root root 4096 Dec 9 20:36 lxdbr0
root@server /var/snap/lxd/common/lxd/networks/lxdbr0 # ls -la
total 20
drwx--x--x 3 root root 4096 Dec 9 20:36 .
drwx--x--x 3 root root 4096 Dec 9 20:36 ..
drwxr-xr-x 2 root root 4096 Dec 11 19:49 dnsmasq.hosts
-rw-r--r-- 1 root root 0 Dec 9 20:36 dnsmasq.leases
-rw-r--r-- 1 root root 669 Dec 11 19:49 dnsmasq.pid
-rw-r--r-- 1 root root 1 Dec 11 19:49 dnsmasq.raw
root@server /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts # ls -la
total 12
drwxr-xr-x 2 root root 4096 Dec 11 19:49 .
drwx--x--x 3 root root 4096 Dec 9 20:36 ..
-rw-r--r-- 1 root root 23 Dec 11 19:49 test
root@server /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts # cat ./test
00:16:3e:0c:d5:9d,test
After applying the bridge.mtu trick (setting and unsetting) the previously empty dnsmasq.leases file is now:
root@server /var/snap/lxd/common/lxd/networks/lxdbr0 # cat ./dnsmasq.leases
1607725689 00:16:3e:0c:d5:9d 10.2.35.86 test ff:a7:63:ac:d2:00:02:00:00:ab:11:58:2f:15:ab:7b:1d:90:84
After rebooting the server:
root@server ~ # ps -ef|grep dnsmasq
root 3939 3883 0 21:34 pts/0 00:00:00 grep --color=auto dnsmasq
After applying the bridge.mtu trick:
root@server ~ # ps -ef|grep dnsmasq
lxd 4296 1891 20 21:34 ? 00:00:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.2.35.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.2.35.2,10.2.35.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
root 4298 3883 0 21:34 pts/0 00:00:00 grep --color=auto dnsmasq
So to confirm my understanding, dnsmasq is not starting automatically, but when it does IPs are allocated ok?
That's correct. Running the bridge.mtu trick starts dnsmasq which subsequently assigns IP address to the container.
What's in lxd.log and journalctl -b 0 | grep dnsmasq
?
root@server /var/snap/lxd/common/lxd/logs # cat ./lxd.log
t=2020-12-12T15:33:15+0000 lvl=info msg="LXD 4.0.4 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2020-12-12T15:33:15+0000 lvl=info msg="Kernel uid/gid map:"
t=2020-12-12T15:33:15+0000 lvl=info msg=" - u 0 0 4294967295"
t=2020-12-12T15:33:15+0000 lvl=info msg=" - g 0 0 4294967295"
t=2020-12-12T15:33:15+0000 lvl=info msg="Configured LXD uid/gid map:"
t=2020-12-12T15:33:15+0000 lvl=info msg=" - u 0 1000000 1000000000"
t=2020-12-12T15:33:15+0000 lvl=info msg=" - g 0 1000000 1000000000"
t=2020-12-12T15:33:16+0000 lvl=info msg="Kernel features:"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - closing multiple file descriptors efficiently: no"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - pidfds: yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - uevent injection: yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - seccomp listener: yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - seccomp listener add file descriptors: no"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - attach to namespaces via pidfds: no"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - safe native terminal allocation : yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - cgroup layout: hybrid"
t=2020-12-12T15:33:16+0000 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored"
t=2020-12-12T15:33:16+0000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - shiftfs support: disabled"
t=2020-12-12T15:33:16+0000 lvl=info msg="Initializing local database"
t=2020-12-12T15:33:16+0000 lvl=info msg="Starting /dev/lxd handler:"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-12-12T15:33:16+0000 lvl=info msg="REST API daemon:"
t=2020-12-12T15:33:16+0000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-12-12T15:33:16+0000 lvl=info msg=" - binding TCP socket" socket=10.2.35.1:8443
t=2020-12-12T15:33:16+0000 lvl=info msg="Initializing global database"
t=2020-12-12T15:33:16+0000 lvl=info msg="Firewall loaded driver \"xtables\""
t=2020-12-12T15:33:16+0000 lvl=info msg="Initializing storage pools"
t=2020-12-12T15:33:17+0000 lvl=info msg="Initializing daemon storage mounts"
t=2020-12-12T15:33:17+0000 lvl=info msg="Initializing networks"
t=2020-12-12T15:33:18+0000 lvl=info msg="Pruning leftover image files"
t=2020-12-12T15:33:18+0000 lvl=info msg="Done pruning leftover image files"
t=2020-12-12T15:33:18+0000 lvl=info msg="Loading daemon configuration"
t=2020-12-12T15:33:18+0000 lvl=info msg="Started seccomp handler" path=/var/snap/lxd/common/lxd/seccomp.socket
t=2020-12-12T15:33:18+0000 lvl=info msg="Pruning expired images"
t=2020-12-12T15:33:18+0000 lvl=info msg="Done pruning expired images"
t=2020-12-12T15:33:18+0000 lvl=info msg="Pruning expired instance backups"
t=2020-12-12T15:33:18+0000 lvl=info msg="Done pruning expired instance backups"
t=2020-12-12T15:33:18+0000 lvl=info msg="Expiring log files"
t=2020-12-12T15:33:18+0000 lvl=info msg="Updating instance types"
t=2020-12-12T15:33:18+0000 lvl=info msg="Done expiring log files"
t=2020-12-12T15:33:18+0000 lvl=info msg="Done updating instance types"
t=2020-12-12T15:33:18+0000 lvl=info msg="Starting container" action=start created=2020-12-12T08:57:17+0000 ephemeral=false name=test project=default stateful=false used=2020-12-12T09:37:16+0000
t=2020-12-12T15:33:18+0000 lvl=info msg="Started container" action=start created=2020-12-12T08:57:17+0000 ephemeral=false name=test project=default stateful=false used=2020-12-12T09:37:16+0000
root@server ~ # journalctl -b 0 | grep dnsmasq
Dec 12 15:33:17 server audit[3092]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdbr0_</var/snap/lxd/common/lxd>" pid=3092 comm="apparmor_parser"
Dec 12 15:33:17 server kernel: audit: type=1400 audit(1607787197.863:26): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdbr0_</var/snap/lxd/common/lxd>" pid=3092 comm="apparmor_parser"
Dec 12 15:33:18 server dnsmasq[3119]: unknown interface lxdbr0
Dec 12 15:33:18 server dnsmasq[3119]: FAILED to start up
However, lxdbr0 exists:
root@server ~ # ifconfig lxdbr0
lxdbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.2.35.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::216:3eff:fe7d:7425 prefixlen 64 scopeid 0x20<link>
ether 00:16:3e:7d:74:25 txqueuelen 1000 (Ethernet)
RX packets 41 bytes 10280 (10.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Another piece of information from syslog:
root@server /var/log # cat ./syslog | grep lxdbr0
Dec 12 15:33:17 server systemd-udevd[473]: lxdbr0: Could not generate persistent MAC: No data available
Dec 12 15:33:17 server systemd-networkd[1248]: lxdbr0: Link UP
Dec 12 15:33:17 server kernel: [ 18.194394] audit: type=1400 audit(1607787197.863:26): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdbr0_</var/snap/lxd/common/lxd>" pid=3092 comm="apparmor_parser"
Dec 12 15:33:18 server kernel: [ 18.658325] lxdbr0: port 1(veth2771ed3f) entered blocking state
Dec 12 15:33:18 server kernel: [ 18.658328] lxdbr0: port 1(veth2771ed3f) entered disabled state
Dec 12 15:33:18 server kernel: [ 18.658475] lxdbr0: port 1(veth2771ed3f) entered blocking state
Dec 12 15:33:18 server kernel: [ 18.658477] lxdbr0: port 1(veth2771ed3f) entered forwarding state
Dec 12 15:33:18 server kernel: [ 18.658540] lxdbr0: port 1(veth2771ed3f) entered disabled state
Dec 12 15:33:18 server dnsmasq[3119]: unknown interface lxdbr0
Dec 12 15:33:18 server systemd-networkd[1248]: lxdbr0: Gained carrier
Dec 12 15:33:18 server systemd-networkd[1248]: lxdbr0: Gained IPv6LL
Dec 12 15:33:18 server kernel: [ 19.022889] lxdbr0: port 1(veth2771ed3f) entered blocking state
Dec 12 15:33:18 server kernel: [ 19.022892] lxdbr0: port 1(veth2771ed3f) entered forwarding state
Dec 12 15:33:18 server kernel: [ 19.022952] IPv6: ADDRCONF(NETDEV_CHANGE): lxdbr0: link becomes ready
That's odd, it suggests the interface vanished prior to dnsmasq starting...
Maybe something else is trying to configure it?
It has something to do with my VPN (tinc). If I turn tinc off then dnsmasq seems fine. I will keep digging.
I have two interfaces with the same IP (so that all LXD containers deployed on various physical servers can talk to each other through the VPN):
lxdbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.2.35.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::216:3eff:fe7d:7425 prefixlen 64 scopeid 0x20<link>
ether 00:16:3e:7d:74:25 txqueuelen 1000 (Ethernet)
RX packets 135 bytes 24510 (24.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19 bytes 2800 (2.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vpn: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1500
inet 10.2.35.1 netmask 255.255.255.255 destination 10.2.35.1
inet6 fe80::e131:51d1:f236:a5fd prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC)
RX packets 3483 bytes 448721 (448.7 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3236 bytes 683271 (683.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
This used to work fine up until version 4.0.2. It looks like a race condition on some servers (it affects roughly 2/3 of my Ubuntu 20.04 servers; all Ubuntu 18.04 which are on LXD 3.0 are fine).
I have a hack for now - by adding ExecStartPre=/bin/sleep 15
to the tinc systemd service.
Do you use the fan network at all?
I don't use fan network. I found the solution using tinc vpn long before fan network was introduced and I followed the "do not fix it if it ain't broke" rule. Looks like I should look into fan network setup ...
It's fine, I only mentioned it because it can look for interfaces using IP so having two interfaces with same IP may confuse it, but that won't be it if you don't use it.
It is interesting to see this, my macvlan setup also stopped working yesterday, but I have not investigated it yet.
My initial hack - adding ExecStartPre=/bin/sleep 15
to the tinc systemd service - has sometimes negative side-effects:
It prevents LXD from listening on port 8443.
root@server ~ # netstat -tulpn | grep lxd
does not return any output.
It could be fixed by running snap restart lxd
On some servers an alternative hack works: it involves adding ExecStartPre=/bin/sleep 15
to [Service] section in /etc/systemd/system/snap.lxd.daemon.service
The same issue appears when wireguard interface has the same IP as the LXD bridge:
4: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
link/none
inet 10.2.66.1/32 scope global wg0
valid_lft forever preferred_lft forever
5: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 00:16:3e:63:3b:f3 brd ff:ff:ff:ff:ff:ff
inet 10.2.66.1/24 scope global lxdbr0
valid_lft forever preferred_lft forever
inet6 fe80::216:3eff:fe63:3bf3/64 scope link
valid_lft forever preferred_lft forever
This has not been an issue in LXD 3 and early releases of LXD 4. I think it started somewhere around version 4.0.2 or 4.0.3.
Does dnsmasq start OK, can you show output of ps aux | grep dnsmasq
as well as sudo ss -ulpn
thanks
root@server:~# ps aux|grep dnsmasq
root 5443 0.0 0.0 8160 2580 pts/0 S+ 11:06 0:00 grep --color=auto dnsmasq
root@server:~# ss -ulpn
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
UNCONN 0 0 0.0.0.0:2049 0.0.0.0:*
UNCONN 0 0 0.0.0.0:34937 0.0.0.0:*
UNCONN 0 0 0.0.0.0:51820 0.0.0.0:*
UNCONN 0 0 0.0.0.0:53033 0.0.0.0:* users:(("rpc.mountd",pid=1771,fd=16))
UNCONN 0 0 0.0.0.0:53826 0.0.0.0:* users:(("rpc.mountd",pid=1771,fd=12))
UNCONN 0 0 0.0.0.0:47522 0.0.0.0:* users:(("rpc.mountd",pid=1771,fd=8))
UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=1625,fd=12))
UNCONN 0 0 X.X.X.X%eno3:68 0.0.0.0:* users:(("systemd-network",pid=1619,fd=19))
UNCONN 0 0 0.0.0.0:111 0.0.0.0:* users:(("rpcbind",pid=1595,fd=5),("systemd",pid=1,fd=36))
UNCONN 0 0 [::]:2049 [::]:*
UNCONN 0 0 [::]:34932 [::]:* users:(("rpc.mountd",pid=1771,fd=10))
UNCONN 0 0 [::]:51555 [::]:* users:(("rpc.mountd",pid=1771,fd=14))
UNCONN 0 0 [::]:51820 [::]:*
UNCONN 0 0 [::]:43070 [::]:*
UNCONN 0 0 [::]:111 [::]:* users:(("rpcbind",pid=1595,fd=7),("systemd",pid=1,fd=38))
UNCONN 0 0 [::]:50174 [::]:* users:(("rpc.mountd",pid=1771,fd=18))
where X.X.X.X is the public IP address
I have a temporary fix (it is not pretty tough) by adding ExecStartPre=/bin/sleep 15
to the [Service] section in /etc/systemd/system/multi-user.target.wants/wg-quick@wg0.service
Can you remove the workaround, enable debug mode and see if dnsmasq is failing to start with an error in syslog and then review the LXD start up logs and see if LXD reports failing to start dnsmasq:
sudo snap set lxd daemon.debug=true; sudo systemctl reload snap.lxd.daemon
sudo tail -f /var/snap/lxd/common/lxd/logs/lxd.log
root@server:~# cat /var/snap/lxd/common/lxd/logs/lxd.log
t=2021-01-30T08:24:08+0000 lvl=info msg="LXD 4.0.4 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2021-01-30T08:24:08+0000 lvl=info msg="Kernel uid/gid map:"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - u 0 0 4294967295"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - g 0 0 4294967295"
t=2021-01-30T08:24:08+0000 lvl=info msg="Configured LXD uid/gid map:"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - u 0 1000000 1000000000"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - g 0 1000000 1000000000"
t=2021-01-30T08:24:08+0000 lvl=info msg="Kernel features:"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - closing multiple file descriptors efficiently: no"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - pidfds: yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - uevent injection: yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - seccomp listener: yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - seccomp listener add file descriptors: no"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - attach to namespaces via pidfds: no"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - safe native terminal allocation : yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - unprivileged file capabilities: yes"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - cgroup layout: hybrid"
t=2021-01-30T08:24:08+0000 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored"
t=2021-01-30T08:24:08+0000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2021-01-30T08:24:08+0000 lvl=info msg=" - shiftfs support: disabled"
t=2021-01-30T08:24:08+0000 lvl=info msg="Initializing local database"
t=2021-01-30T08:24:08+0000 lvl=dbug msg="Initializing database gateway"
t=2021-01-30T08:24:08+0000 lvl=dbug msg="Start database node" address= id=1 role=voter
t=2021-01-30T08:24:09+0000 lvl=info msg="Starting /dev/lxd handler:"
t=2021-01-30T08:24:09+0000 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2021-01-30T08:24:09+0000 lvl=info msg="REST API daemon:"
t=2021-01-30T08:24:09+0000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2021-01-30T08:24:09+0000 lvl=info msg="Initializing global database"
t=2021-01-30T08:24:09+0000 lvl=dbug msg="Dqlite: attempt 0: server 1: connected"
t=2021-01-30T08:24:09+0000 lvl=dbug msg="Firewall xtables detected iptables is in use"
t=2021-01-30T08:24:09+0000 lvl=info msg="Firewall loaded driver \"xtables\""
t=2021-01-30T08:24:09+0000 lvl=info msg="Initializing storage pools"
t=2021-01-30T08:24:09+0000 lvl=dbug msg="Initializing and checking storage pool \"default\""
t=2021-01-30T08:24:09+0000 lvl=dbug msg="Mount started" driver=zfs pool=default
t=2021-01-30T08:24:09+0000 lvl=dbug msg="Mount finished" driver=zfs pool=default
t=2021-01-30T08:24:10+0000 lvl=info msg="Initializing daemon storage mounts"
t=2021-01-30T08:24:10+0000 lvl=info msg="Initializing networks"
t=2021-01-30T08:24:10+0000 lvl=dbug msg=Start driver=bridge network=lxdbr0
t=2021-01-30T08:24:10+0000 lvl=dbug msg="Setting up network" driver=bridge network=lxdbr0
t=2021-01-30T08:24:10+0000 lvl=dbug msg="Stable MAC generated" driver=bridge hwAddr=00:16:3e:63:3b:f3 network=lxdbr0 seed=b1f38ff95313c1ba382c3f3c8584c3dc5c1bad42213aa4c700203871c7582c94.0.1
t=2021-01-30T08:24:10+0000 lvl=dbug msg="New task Operation: 8a206038-7090-4120-b6a5-f3db48a0669c"
t=2021-01-30T08:24:10+0000 lvl=info msg="Pruning leftover image files"
t=2021-01-30T08:24:10+0000 lvl=dbug msg="Started task operation: 8a206038-7090-4120-b6a5-f3db48a0669c"
t=2021-01-30T08:24:10+0000 lvl=info msg="Done pruning leftover image files"
t=2021-01-30T08:24:10+0000 lvl=info msg="Loading daemon configuration"
t=2021-01-30T08:24:10+0000 lvl=dbug msg="Initialized inotify with file descriptor 29"
t=2021-01-30T08:24:10+0000 lvl=dbug msg="Success for task operation: 8a206038-7090-4120-b6a5-f3db48a0669c"
t=2021-01-30T08:24:10+0000 lvl=info msg="Started seccomp handler" path=/var/snap/lxd/common/lxd/seccomp.socket
t=2021-01-30T08:24:10+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:10+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:11+0000 lvl=dbug msg="New task Operation: a6870700-e583-4f42-8ae6-99b906c44e85"
t=2021-01-30T08:24:11+0000 lvl=info msg="Pruning expired images"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Started task operation: a6870700-e583-4f42-8ae6-99b906c44e85"
t=2021-01-30T08:24:11+0000 lvl=info msg="Done pruning expired images"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Success for task operation: a6870700-e583-4f42-8ae6-99b906c44e85"
t=2021-01-30T08:24:11+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:11+0000 lvl=dbug msg="New task Operation: c3d7099d-7a97-4b63-851f-4fb28e5f9baa"
t=2021-01-30T08:24:11+0000 lvl=info msg="Pruning expired instance backups"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Started task operation: c3d7099d-7a97-4b63-851f-4fb28e5f9baa"
t=2021-01-30T08:24:11+0000 lvl=info msg="Done pruning expired instance backups"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Success for task operation: c3d7099d-7a97-4b63-851f-4fb28e5f9baa"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="New task Operation: c7d66ec7-462b-4db4-a098-4cd1a0f4f9b6"
t=2021-01-30T08:24:11+0000 lvl=info msg="Updating instance types"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Started task operation: c7d66ec7-462b-4db4-a098-4cd1a0f4f9b6"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="New task Operation: a627aa6c-dfab-4e5c-ae08-7d5c70cec283"
t=2021-01-30T08:24:11+0000 lvl=info msg="Done updating instance types"
t=2021-01-30T08:24:11+0000 lvl=info msg="Expiring log files"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Started task operation: a627aa6c-dfab-4e5c-ae08-7d5c70cec283"
t=2021-01-30T08:24:11+0000 lvl=info msg="Done expiring log files"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Success for task operation: a627aa6c-dfab-4e5c-ae08-7d5c70cec283"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="MountInstance started" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Mounted ZFS dataset" dev=tank/lxd/containers/test1 driver=zfs path=/var/snap/lxd/common/lxd/storage-pools/default/containers/test1 pool=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="MountInstance finished" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Starting device" device=eth0 instance=test1 project=default type=nic
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Scheduler: network: vetha78c31dd has been added: updating network priorities"
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Scheduler: network: vethb2ddf64e has been added: updating network priorities"
t=2021-01-30T08:24:11+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:11+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Starting device" device=root instance=test1 project=default type=disk
t=2021-01-30T08:24:11+0000 lvl=dbug msg="UpdateInstanceBackupFile started" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="UpdateInstanceBackupFile finished" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="MountInstance started" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="MountInstance finished" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=info msg="Starting container" action=start created=2020-12-23T20:38:43+0000 ephemeral=false name=test1 project=default stateful=false used=2021-01-30T08:03:52+0000
t=2021-01-30T08:24:11+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/containers/1/onstart username=root
t=2021-01-30T08:24:11+0000 lvl=dbug msg="MountInstance started" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="MountInstance finished" driver=zfs instance=test1 pool=default project=default
t=2021-01-30T08:24:11+0000 lvl=dbug msg="Scheduler: container test1 started: re-balancing"
t=2021-01-30T08:24:12+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:12+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:12+0000 lvl=info msg="Started container" action=start created=2020-12-23T20:38:43+0000 ephemeral=false name=test1 project=default stateful=false used=2021-01-30T08:03:52+0000
t=2021-01-30T08:24:12+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:12+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:12+0000 lvl=dbug msg="Success for task operation: c7d66ec7-462b-4db4-a098-4cd1a0f4f9b6"
t=2021-01-30T08:24:13+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:13+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:13+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:13+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:14+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:15+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:15+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:15+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:15+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:16+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:16+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:16+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:16+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:17+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:17+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
t=2021-01-30T08:24:17+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0 username=root
t=2021-01-30T08:24:17+0000 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/internal/ready username=root
root@server:~# lxc list
+----------+---------+------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------+---------+------+------+-----------+-----------+
| test1 | RUNNING | | | CONTAINER | 0 |
+----------+---------+------+------+-----------+-----------+
Sounds to me like dnsmasq is starting and then exiting, so LXD doesn't know about it.
Can you look in syslog/journal for dnsmasq related message?
Logs around the time when the problem occurred:
root@server:~# cat ./syslog.2 | grep dns
Jan 30 08:23:59 seaborgium NetworkManager[1666]: <info> [1611995037.8370] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-dns-resolved.conf, 10-globally-managed-devices.conf, no-mac-addr-change.conf) (etc: default-wifi-powersave-on.conf)
Jan 30 08:23:58 seaborgium systemd[1]: pppd-dns.service: Succeeded.
Jan 30 08:23:59 seaborgium kernel: [ 48.227624] Key type dns_resolver registered
Jan 30 08:24:00 seaborgium NetworkManager[1666]: <info> [1611995040.5319] dns-mgr[0x560640e0a290]: init: dns=systemd-resolved rc-manager=symlink, plugin=systemd-resolved
Jan 30 08:24:07 seaborgium lxd.daemon[3299]: - shared_pidns
Jan 30 08:24:10 seaborgium kernel: [ 99.663938] audit: type=1400 audit(1611995050.704:30): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdbr0_</var/snap/lxd/common/lxd>" pid=3449 comm="apparmor_parser"
Jan 30 08:24:11 seaborgium dnsmasq[3460]: unknown interface lxdbr0
Jan 30 08:24:11 seaborgium dnsmasq[3460]: FAILED to start up
Is there any other keyword I should be searching for in the syslog?
@xvd7 well I think that highlights the problem:
Jan 30 08:24:11 seaborgium dnsmasq[3460]: unknown interface lxdbr0
Jan 30 08:24:11 seaborgium dnsmasq[3460]: FAILED to start up
So as @stgraber said here https://github.com/lxc/lxd/issues/8246#issuecomment-743782852 it looks like something external to LXD on your system is tearing down the lxdbr0 interface as dnsmasq starts. I would check NetworkManager and systemd-networkd to make sure they are not trying to configure it.
I could not find anything that would be tearing down lxdbr0. It seems more like a timing issue:
Jan 30 08:24:10 seaborgium lxd.daemon[3310]: t=2021-01-30T08:24:10+0000 lvl=dbug msg=Start driver=bridge network=lxdbr0 Jan 30 08:24:10 seaborgium lxd.daemon[3310]: t=2021-01-30T08:24:10+0000 lvl=dbug msg="Setting up network" driver=bridge network=lxdbr0 Jan 30 08:24:10 seaborgium NetworkManager[1666]:[1611995050.4649] manager: (lxdbr0): new Bridge device (/org/freedesktop/NetworkManager/Devices/5) Jan 30 08:24:10 seaborgium systemd-udevd[566]: lxdbr0: Could not generate persistent MAC: No data available Jan 30 08:24:10 seaborgium lxd.daemon[3310]: t=2021-01-30T08:24:10+0000 lvl=dbug msg="Stable MAC generated" driver=bridge hwAddr=00:16:3e:63:3b:f3 network=lxdbr0 seed=b1f38ff95313c1ba382c3f3c8584c3dc5c1bad42213aa4c700203871c7582c94.0.1 Jan 30 08:24:10 seaborgium systemd-networkd[1585]: lxdbr0: Link UP Jan 30 08:24:10 seaborgium kernel: [ 99.663938] audit: type=1400 audit(1611995050.704:30): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdbr0_" pid=3449 comm="apparmor_parser" Jan 30 08:24:11 seaborgium dnsmasq[3460]: unknown interface lxdbr0 Jan 30 08:24:11 seaborgium dnsmasq[3460]: FAILED to start up Jan 30 08:24:11 seaborgium kernel: [ 100.589586] lxdbr0: port 1(vethb2ddf64e) entered blocking state Jan 30 08:24:11 seaborgium kernel: [ 100.589588] lxdbr0: port 1(vethb2ddf64e) entered disabled state Jan 30 08:24:12 seaborgium systemd-networkd[1585]: lxdbr0: Gained carrier Jan 30 08:24:12 seaborgium NetworkManager[1666]: [1611995052.0412] device (lxdbr0): carrier: link connected Jan 30 08:24:12 seaborgium kernel: [ 100.980349] lxdbr0: port 1(vethb2ddf64e) entered blocking state Jan 30 08:24:12 seaborgium kernel: [ 100.980354] lxdbr0: port 1(vethb2ddf64e) entered forwarding state Jan 30 08:24:12 seaborgium kernel: [ 100.980468] IPv6: ADDRCONF(NETDEV_CHANGE): lxdbr0: link becomes ready
After adding ExecStartPre=/bin/sleep 15
to the wireguard service the log changes to:
Feb 1 16:11:39 seaborgium lxd.daemon[424204]: t=2021-02-01T16:11:39+0000 lvl=dbug msg=Start driver=bridge network=lxdbr0 Feb 1 16:11:39 seaborgium lxd.daemon[424204]: t=2021-02-01T16:11:39+0000 lvl=dbug msg="Setting up network" driver=bridge network=lxdbr0 Feb 1 16:11:39 seaborgium NetworkManager[1657]:[1612195899.5600] manager: (lxdbr0): new Bridge device (/org/freedesktop/NetworkManager/Devices/19) Feb 1 16:11:39 seaborgium systemd-udevd[423629]: lxdbr0: Could not generate persistent MAC: No data available Feb 1 16:11:39 seaborgium lxd.daemon[424204]: t=2021-02-01T16:11:39+0000 lvl=dbug msg="Stable MAC generated" driver=bridge hwAddr=00:16:3e:63:3b:f3 network=lxdbr0 seed=b1f38ff95313c1ba382c3f3c8584c3dc5c1bad42213aa4c700203871c7582c94.0.1 Feb 1 16:11:39 seaborgium systemd-networkd[1588]: lxdbr0: Link UP Feb 1 16:11:39 seaborgium kernel: [194907.750490] audit: type=1400 audit(1612195899.660:56): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdbr0_" pid=424315 comm="apparmor_parser" Feb 1 16:11:40 seaborgium kernel: [194908.104113] lxdbr0: port 1(veth91deecf4) entered blocking state Feb 1 16:11:40 seaborgium kernel: [194908.104116] lxdbr0: port 1(veth91deecf4) entered disabled state Feb 1 16:11:40 seaborgium kernel: [194908.104263] lxdbr0: port 1(veth91deecf4) entered blocking state Feb 1 16:11:40 seaborgium kernel: [194908.104266] lxdbr0: port 1(veth91deecf4) entered forwarding state Feb 1 16:11:40 seaborgium kernel: [194908.104473] lxdbr0: port 1(veth91deecf4) entered disabled state Feb 1 16:11:40 seaborgium systemd-networkd[1588]: lxdbr0: Gained carrier Feb 1 16:11:40 seaborgium NetworkManager[1657]: [1612195900.3144] device (lxdbr0): carrier: link connected Feb 1 16:11:40 seaborgium kernel: [194908.397895] lxdbr0: port 1(veth91deecf4) entered blocking state Feb 1 16:11:40 seaborgium kernel: [194908.397898] lxdbr0: port 1(veth91deecf4) entered forwarding state Feb 1 16:11:40 seaborgium kernel: [194908.397945] IPv6: ADDRCONF(NETDEV_CHANGE): lxdbr0: link becomes ready Feb 1 16:11:40 seaborgium dnsmasq[424318]: started, version 2.79 cachesize 150 Feb 1 16:11:40 seaborgium dnsmasq[424318]: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect inotify Feb 1 16:11:40 seaborgium dnsmasq-dhcp[424318]: DHCP, IP range 10.2.66.2 -- 10.2.66.254, lease time 1h Feb 1 16:11:40 seaborgium dnsmasq[424318]: using local addresses only for domain lxd Feb 1 16:11:40 seaborgium dnsmasq[424318]: reading /etc/resolv.conf Feb 1 16:11:40 seaborgium dnsmasq[424318]: using local addresses only for domain lxd Feb 1 16:11:40 seaborgium dnsmasq[424318]: using nameserver 213.186.33.99#53 Feb 1 16:11:40 seaborgium dnsmasq[424318]: read /etc/hosts - 9 addresses Feb 1 16:11:40 seaborgium dnsmasq-dhcp[424318]: read /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/test1 Feb 1 16:11:40 seaborgium dnsmasq-dhcp[424318]: DHCPDISCOVER(lxdbr0) 00:16:3e:d1:47:cd Feb 1 16:11:40 seaborgium dnsmasq-dhcp[424318]: DHCPOFFER(lxdbr0) 10.2.66.2 00:16:3e:d1:47:cd Feb 1 16:11:40 seaborgium dnsmasq-dhcp[424318]: DHCPREQUEST(lxdbr0) 10.2.66.2 00:16:3e:d1:47:cd Feb 1 16:11:40 seaborgium dnsmasq-dhcp[424318]: DHCPACK(lxdbr0) 10.2.66.2 00:16:3e:d1:47:cd test1 Feb 1 16:11:41 seaborgium systemd-networkd[1588]: lxdbr0: Gained IPv6LL Feb 1 16:41:41 seaborgium dnsmasq-dhcp[424318]: DHCPREQUEST(lxdbr0) 10.2.66.2 00:16:3e:d1:47:cd Feb 1 16:41:41 seaborgium dnsmasq-dhcp[424318]: DHCPACK(lxdbr0) 10.2.66.2 00:16:3e:d1:47:cd test1
OK thanks, let me see if I can recreate here.
Out of interest, what is the reason for using the lxdbr0 IP address on the wireguard interface? Why not use a separate address and then publish the lxdbr0 subnet as a route on that endpoint?
Out of interest, what is the reason for using the lxdbr0 IP address on the wireguard interface? Why not use a separate address and then publish the lxdbr0 subnet as a route on that endpoint?
It started historically - I was experimenting with LXC (yes - good old LXC before LXD was born). I used 10.X.Y.1 for the node and 10.X.Y.2 etc. for the containers. Then I got more servers and needed encrypted connection among all the containers. So I bolted on a peer-to-peer VPN. I did not know much about networking so I used the same IP for the vpn interface and everything just worked. I never looked back. It was a rock-solid setup up until around LXD 4.0.1.
I've not been able to recreate it, using dummy interfaces, and @stgraber tried using wireguard interfaces also. Without being able to recreate there's not much we can do, especially as this is likely a dnsmasq bug (we bundle our own dnsmasq inside the snap so that may be why it correlates with a specific LXD version).
Does it occur if the wireguard interface doesn't share the same IP as the lxdbr0 interface?
It would seem beneficial to use a separate subnet for the wireguard network, and the publish the lxdbr0 route over the wireguard link so that peers on the wireguard link can then reach other containers on lxdbr0, not just lxdbr0 host?
Thank you for helping to narrow down the issue.
The problem does NOT occur if the wireguard interface has a different IP than the lxdbr0 interface.
I found few changes in dnsmasq from July 2020 such as one labeled "Handle listening on duplicate addresses" https://github.com/pi-hole/FTL/pull/848/commits. That would confirm the suspicion that the problem is related to dnsmasq bug.
Good spotting. Yes the l->used = 1;
bit in https://github.com/pi-hole/FTL/pull/848/commits/85c788d7c3febdac604d0319acbcd3f4b62520a8 is what is related to the unknown interface lxdbr0
error in the logs you see.
I'll close this now, but if you report the issue to dnsmasq, and can re-open once fixed upstream, we then can pull in the changes.
Hi, I'm having a problem with lxd not assigning ipv4 or ipv6 to my containers as well. I'm getting this error in the lxd.log -Failed enabling VLAN filtering on bridge \"lxdbr0\": open /sys/class/net/lxdbr0/bridge/vlan_filtering: permission denied
.
Does anyone know how to fix this?
After upgrading to LXD 4.0.4 the issue was resolved on roughly 1/3 of my servers. Unfortunately, the other 2/3 still suffer from this problem - IP address not assigned to the container unless I apply the bridge.mtu trick.
systemctl restart snap.lxd.daemon
does not fix it.These are logs from a freshly re-installed server:
Is there any other information that would be helpful?
Originally posted by @xvd7 in https://github.com/lxc/lxd/issues/7935#issuecomment-743397331