canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

LXD instances cannot reach the WAN #10105

Closed jamesbeedy closed 2 years ago

jamesbeedy commented 2 years ago

All of the sudden my lxd instances can't talk to the WAN. I initially thought this had something to do with modifications I have made to my local routing table, but after troubleshooting I realize that is not the case.

Required information

Issue description

LXD containers no longer have outbound networking.

Just the other day, my lxc instances communicated to the WAN just fine. Today, I try to launch an instance and it can't communicate with anything on the internet.

I have validated this on my local dev system, coworkers systems and fresh new 20.04 machines.

Steps to reproduce

1) Install a fresh ubuntu 20.04 machine 2) refresh the lxd snap to latest/stable 3) run sudo lxd init 4) Launch an ubuntu:20.04 instance, exec in and try to ping google - this will fail

jbemfv@VDL900076:~$ juju add-machine --series focal --constraints "spaces=maas-vm cores=8 mem=8G root-disk=20G"
created machine 0
jbemfv@VDL900076:~$ wjst
jbemfv@VDL900076:~$ juju ssh 0
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-105-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue Mar 22 21:49:53 UTC 2022

  System load:  1.44               Processes:             197
  Usage of /:   36.1% of 18.21GB   Users logged in:       0
  Memory usage: 4%                 IPv4 address for eth7: 10.104.196.120
  Swap usage:   0%

0 updates can be applied immediately.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@keen-mayfly:~$ snap list
Name    Version   Rev    Tracking       Publisher   Notes
core20  20220304  1376   latest/stable  canonical✓  base
lxd     4.0.9     22526  4.0/stable/…   canonical✓  -
snapd   2.54.4    15177  latest/stable  canonical✓  snapd
ubuntu@keen-mayfly:~$ sudo snap refresh lxd --channel latest/stable
Download snap "lxd" (22710) from channel "latest/stable"                                                 83% 7.78MB/s 1.34sDownload snap "lxd" (22710) from channel "latest/stable"                                                 85% 7.78MB/s 1.24sDownload snap "lxd" (22710) from channel "latest/stable"                                                 86% 7.78MB/s 1.14sDownload snap "lxd" (22710) from channel "latest/stable"                                                 87% 7.78MB/s 1.04sDownload snap "lxd" (22710) from channel "latest/stable"                                                 88% 7.78MB/s 940msDownload snap "lxd" (22710) from channel "latest/stable"                                                 92% 7.80MB/s 620msDownload snap "lxd" (22710) from channel "latest/stable"                                                 94% 7.80MB/s 520msDownload snap "lxd" (22710) from channel "latest/stable"                                                 96% 7.81MB/s 301msDownload snap "lxd" (22710) from channel "latest/stable"                                                 97% 7.81MB/s 201msDownload snap "lxd" (22710) from channel "latest/stable"                                                 99% 7.81MB/s 101msDownload snap "lxd" (22710) from channel "latest/stable"                                                100% 7.71MB/s 0.0nsDownload snap "lxd" (22710) from channel "latest/stable"                                                100% 7.61MB/s 0.0nsDownload snap "lxd" (22710) from channel "latest/stable"                                                100% 7.52MB/s 0.0nsDownload snap "lxd" (22710) from channel "latest/stable"                                                100% 7.34MB/s 0.0nsFetch and check assertions for snap "lxd" (22710)                                                                         /Fetch and check assertions for snap "lxd" (22710)                                                                         -Mount snap "lxd" (22710)                                                                                                  |Mount snap "lxd" (22710)                                                                                                  /Mount snap "lxd" (22710)                                                                                                  \Mount snap "lxd" (22710)                                                                                                  |Stop snap "lxd" services                                                                                                  -Stop snap "lxd" services                                                                                                  \Stop snap "lxd" services                                                                                                  |Stop snap "lxd" services                                                                                                  /Stop snap "lxd" services                                                                                                  -Stop snap "lxd" services                                                                                                  |Stop snap "lxd" services                                                                                                  /Stop snap "lxd" services                                                                                                  -Stop snap "lxd" services                                                                                                  \Stop snap "lxd" services                                                                                                  |Stop snap "lxd" services                                                                                                  /Stop snap "lxd" services                                                                                                  \Stop snap "lxd" services                                                                                                  |Stop snap "lxd" services                                                                                                  -Stop snap "lxd" services                                                                                                  |Stop snap "lxd" services                                                                                                  -Handling re-refresh of "lxd" as needed                                                                                    \
lxd 4.24 from Canonical✓ refreshed
ubuntu@keen-mayfly:~$ sudo snap install juju --channel latest/stable
error: This revision of snap "juju" was published using classic confinement and thus may perform
       arbitrary system changes outside of the security sandbox that snaps are usually confined to,
       which may put your system at risk.

       If you understand and want to proceed repeat the command including --classic.
ubuntu@keen-mayfly:~$ sudo snap install juju --channel latest/stable --classic
error: cannot perform the following tasks:
- Download snap "core18" (2344) from channel "stable" (Get https://canonical-lgw01.cdn.snapcraftcontent.com/download-origin/canonical-lgw01/CSO04Jhav2yK0uz97cr0ipQRyqg0qQL6_2344.snap?interactive=1&token=1647997200_eb7558078480cacd423b8ea4f66bb3fa39b76bf2: dial tcp: lookup canonical-lgw01.cdn.snapcraftcontent.com: No address associated with hostname)
ubuntu@keen-mayfly:~$ sudo snap install juju --channel latest/stable --classic
Download snap "juju" (18573) from channel "latest/stable"                                                13% 3.23MB/s 25.4s
Download snap "juju" (18573) from channel "latest/stable"                                                14% 3.21MB/s 25.4s
Download snap "juju" (18573) from channel "latest/stable"                                                14% 3.23MB/s 25.1s
Download snap "juju" (18573) from channel "latest/stable"                                                15% 3.21MB/s 25.1s
Download snap "juju" (18573) from channel "latest/stable"                                                16% 3.23MB/s 24.8s
Download snap "juju" (18573) from channel "latest/stable"                                                16% 3.21MB/s 24.7s
Download snap "juju" (18573) from channel "latest/stable"                                                17% 3.20MB/s 24.6s
Download snap "juju" (18573) from channel "latest/stable"                                                17% 3.21MB/s 24.4s
juju 2.9.27 from Canonical✓ installed
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ 
ubuntu@keen-mayfly:~$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (lvm, zfs, ceph, btrfs, dir) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 
Size in GB of the new loop device (1GB minimum) [default=5GB]: 10G
Invalid input: strconv.ParseInt: parsing "10G": invalid syntax

Size in GB of the new loop device (1GB minimum) [default=5GB]: 10GB
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like the LXD server to be available over the network? (yes/no) [default=no]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 
ubuntu@keen-mayfly:~$ lxc launch ubuntu:20.04 u1
If this is your first time running LXD on this machine, you should also run: lxd init

Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: permission denied
ubuntu@keen-mayfly:~$ sudo adduser ubuntu lxd
Adding user `ubuntu' to group `lxd' ...
Adding user ubuntu to group lxd
Done.
ubuntu@keen-mayfly:~$ newgrp lxd
ubuntu@keen-mayfly:~$ lxc launch ubuntu:20.04 u1
Creating u1
Retrieving image: rootfs: 21% (19.22MB/s)   
Starting u1                                 
ubuntu@keen-mayfly:~$ lxc shell u1
root@u1:~# ping google.com
PING google.com (172.217.21.174) 56(84) bytes of data.

dmesg output

https://paste.ubuntu.com/p/D7dbXDtPf6/

Please let me know if you need anything else.

Thanks

stgraber commented 2 years ago

Can you check that:

And then give us the output of:

stgraber commented 2 years ago

This seems to be using LXD 4.0.9 which hasn't been updated in several weeks so if this used to work some days ago, then it's very unlikely to be caused by anything we did :)

tomponline commented 2 years ago

ubuntu@keen-mayfly:~$ sudo snap refresh lxd --channel latest/stable Download snap "lxd" (22710) from channel "latest/stable" 83% 7.78MB/s 1.34sDownload snap lxd 4.24 from Canonical✓ refreshed

Looks like the OP refreshed to LXD 4.24.

Most likely this is a firewall issue. These can occur suddenly if you have an ordering issue between your system firewall and the one LXD sets up.

Please show the output of:

stgraber commented 2 years ago

Oh right, I missed the refresh.

jamesbeedy commented 2 years ago

$ lxc config show u5 --expanded

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20220321)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20220321"
  image.type: squashfs
  image.version: "20.04"
  volatile.base_image: 9a04aa57d48d12a3a82eb71587eeef726924c3088a84a3acc62d84f02c11f32e
  volatile.eth0.host_name: veth5b0d419b
  volatile.eth0.hwaddr: 00:16:3e:59:53:d4
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 56b21b5b-04ac-4269-a406-bbbc9fbb1133
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

ip a in host

ubuntu@keen-mayfly:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:12:d3:a3 brd ff:ff:ff:ff:ff:ff
    inet 10.104.196.120/23 brd 10.104.197.255 scope global eth7
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe12:d3a3/64 scope link 
       valid_lft forever preferred_lft forever
3: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:27:7e:00 brd ff:ff:ff:ff:ff:ff
    inet 10.14.99.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe27:7e00/64 scope link 
       valid_lft forever preferred_lft forever
5: veth66c0b3a0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether aa:71:95:7a:2d:3c brd ff:ff:ff:ff:ff:ff link-netnsid 0
8: veth5b0d419b@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether ba:bc:38:22:48:16 brd ff:ff:ff:ff:ff:ff link-netnsid 1

ip a in container

$ lxc shell u5
root@u5:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:59:53:d4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.14.99.164/24 brd 10.14.99.255 scope global dynamic eth0
       valid_lft 2989sec preferred_lft 2989sec
    inet6 fe80::216:3eff:fe59:53d4/64 scope link 
       valid_lft forever preferred_lft forever
ubuntu@keen-mayfly:~$ sudo nft list ruleset
table inet lxd {
    chain pstrt.lxdbr0 {
        type nat hook postrouting priority srcnat; policy accept;
        @nh,96,24 659043 @nh,128,24 != 659043 masquerade
    }

    chain fwd.lxdbr0 {
        type filter hook forward priority filter; policy accept;
        ip version 4 oifname "lxdbr0" accept
        ip version 4 iifname "lxdbr0" accept
    }

    chain in.lxdbr0 {
        type filter hook input priority filter; policy accept;
        iifname "lxdbr0" tcp dport 53 accept
        iifname "lxdbr0" udp dport 53 accept
        iifname "lxdbr0" icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
        iifname "lxdbr0" udp dport 67 accept
    }

    chain out.lxdbr0 {
        type filter hook output priority filter; policy accept;
        oifname "lxdbr0" tcp sport 53 accept
        oifname "lxdbr0" udp sport 53 accept
        oifname "lxdbr0" icmp type { destination-unreachable, time-exceeded, parameter-problem } accept
        oifname "lxdbr0" udp sport 67 accept
    }
}
$ sudo iptables-save
# Generated by iptables-save v1.8.4 on Wed Mar 23 22:58:42 2022
*raw
:PREROUTING ACCEPT [325446:1196590277]
:OUTPUT ACCEPT [189286:10633456]
COMMIT
# Completed on Wed Mar 23 22:58:42 2022
# Generated by iptables-save v1.8.4 on Wed Mar 23 22:58:42 2022
*mangle
:PREROUTING ACCEPT [325446:1196590277]
:INPUT ACCEPT [301092:1149692953]
:FORWARD ACCEPT [21336:46097554]
:OUTPUT ACCEPT [189288:10633832]
:POSTROUTING ACCEPT [210624:56731386]
COMMIT
# Completed on Wed Mar 23 22:58:42 2022
# Generated by iptables-save v1.8.4 on Wed Mar 23 22:58:42 2022
*nat
:PREROUTING ACCEPT [3389:886619]
:INPUT ACCEPT [324:85006]
:OUTPUT ACCEPT [347:26057]
:POSTROUTING ACCEPT [346:26017]
COMMIT
# Completed on Wed Mar 23 22:58:42 2022
# Generated by iptables-save v1.8.4 on Wed Mar 23 22:58:42 2022
*filter
:INPUT ACCEPT [301092:1149692953]
:FORWARD ACCEPT [21336:46097554]
:OUTPUT ACCEPT [189293:10634784]
COMMIT
# Completed on Wed Mar 23 22:58:42 2022
jamesbeedy commented 2 years ago
$ sudo iptables -L -n -v
Chain INPUT (policy ACCEPT 301K packets, 1150M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 21336 packets, 46M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 189K packets, 11M bytes)
 pkts bytes target     prot opt in     out     source               destination 
stgraber commented 2 years ago

Not seeing anything obviously wrong there. Can you do:

tomponline commented 2 years ago

Can we see 'lxc network show lxdbr0' too please

tomponline commented 2 years ago

Also you didn't provide the requested output of 'ip r' from host and container

One-sixth commented 2 years ago

@stgraber I have a similar problem. The instance is randomly disconnected after rebooting the host. Whenever something like this happens, I need to manually lxd shutdown and sudo lxd to restart the LXD service. Then instance can access the WAN and host's LAN again. This problem only affects the instance's access to the WAN and host's LAN, and does not affect the host's access to the instance through the proxy device. This problem occurs in containers that are automatically started after reboot the host, and the probability of occurrence is 20%. The instance uses LxC's default network settings without special changes.

System is ubuntu20.04. LXD version is 4.23.

tomponline commented 2 years ago

disconnected

Unfortunately "disconnected" comes in many ways, so without being able to narrow down what form of disconnection is occuring in both cases it is not possible to resolve it.

Next time it happens please can you gather the diagnostics output requested in this thread to see if we can narrow it down.

Thanks