canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

lxc destroy cname generates Failed to remove device eth0 #6666

Closed davidfavor closed 4 years ago

davidfavor commented 4 years ago
net16 # lxc --version
3.18

net16 # uname -a
Linux net16.faststablehosting.com 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

net16 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:    18.04
Codename:   bionic

After a container creation + container running.

Stopping the container, then attempting a delete produces this...

net16 # lxc delete net16-test-itk 
Error: Failed to remove device 'eth0': [Failed to release DHCPv4 lease for instance "net16-test-itk", IP "10.101.55.65", MAC "00:16:3e:43:46:05", write udp 10.101.55.1:36820->10.101.55.1:67: write: operation not permitted]

Container is partially removed, so I guess the fix is to somehow force removal of /var/snap/lxd/common/lxd/storage-pools/default/containers/net16-test-itk residue.

Be great if someone can provide the correct way to force container removal, so all cleanup is done correctly.

Thanks.

stgraber commented 4 years ago

Can you show:

The request to clear the lease shouldn't fail in that way, so we need to figure out if it's being done when it shouldn't or if it's dnsmasq not behaving properly.

stgraber commented 4 years ago

@tomponline

tomponline commented 4 years ago

@davidfavor also please can you send the contents of /var/lib/lxd/networks/<bridge name>/dnsmasq.leases just before and after trying to remove the container.

Does this occur every time a container is created and deleted or intermittently?

stgraber commented 4 years ago

@davidfavor

davidfavor commented 4 years ago

Problem still persists.

Just hit it again this morning.

Intermittent.

net16 # ll var/lib/lxd/networks/*/dnsmasq.leases
/bin/ls: cannot access 'var/lib/lxd/networks/*/dnsmasq.leases': No such file or directory
tomponline commented 4 years ago

@davidfavor please can you provide the info @stgraber asked for, from what you've sent so far it looks like dnsmasq isn't configured, which would explain the issue.

davidfavor commented 4 years ago

Previous data requested. Apologies for taking so long...

net16 # lxc config show net16-template-bionic --expanded
architecture: x86_64
config:
  boot.autostart: "1"
  boot.autostart.delay: "120"
  image.architecture: amd64
  image.description: ubuntu 17.10 amd64 (release) (20180126)
  image.label: release
  image.os: ubuntu
  image.release: artful
  image.serial: "20180126"
  image.version: "17.10"
  volatile.base_image: 41eb905c444d169b9dbb2d8435692699872680441269789ca740ea7c1dbde9d7
  volatile.eno1.hwaddr: 00:16:3e:90:48:e3
  volatile.eno1.name: eth1
  volatile.eth0.hwaddr: 00:16:3e:29:77:58
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: STOPPED
devices:
  david-favor:
    path: /david-favor
    source: /david-favor
    type: disk
  eno1:
    nictype: bridged
    parent: lxdbr0
    type: nic
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

net16 # lxc network list
+--------+----------+---------+-------------+---------+
|  NAME  |   TYPE   | MANAGED | DESCRIPTION | USED BY |
+--------+----------+---------+-------------+---------+
| eno1   | physical | NO      |             | 0       |
+--------+----------+---------+-------------+---------+
| eno2   | physical | NO      |             | 0       |
+--------+----------+---------+-------------+---------+
| lxdbr0 | bridge   | YES     |             | 10      |
+--------+----------+---------+-------------+---------+

net16 # ps aux | grep dnsmasq | grep lxdbr0
lxd        1842  0.0  0.0  49984  3580 ?        Ss   08:42   0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.101.55.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.101.55.2,10.101.55.254,1h --listen-address=fd42:a94f:9bb3:bfd6::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd

net16 # lxc network show lxdbr0
config:
  ipv4.address: 10.101.55.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:a94f:9bb3:bfd6::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/net16-david-favor
- /1.0/instances/net16-dmarc-work
- /1.0/instances/net16-dns-ns10
- /1.0/instances/net16-mail-forwarder
- /1.0/instances/net16-mailstore
- /1.0/instances/net16-template-bionic
managed: true
status: Created
locations:
- none
tomponline commented 4 years ago

@davidfavor thanks, do you have any firewall rules configured on the host?

Could you show the output of iptables -L -v -n (assuming you're using iptables).

davidfavor commented 4 years ago
net16 # iptables -L -v -n
Chain INPUT (policy ACCEPT 460 packets, 38454 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     tcp  --  lxdbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:53 /* generated for LXD network lxdbr0 */
    0     0 ACCEPT     udp  --  lxdbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:53 /* generated for LXD network lxdbr0 */
 1705  558K ACCEPT     udp  --  lxdbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:67 /* generated for LXD network lxdbr0 */
 374K   39M f2b-sshd   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 22

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  48M   19G ACCEPT     all  --  *      lxdbr0  0.0.0.0/0            0.0.0.0/0            /* generated for LXD network lxdbr0 */
  49M   36G ACCEPT     all  --  lxdbr0 *       0.0.0.0/0            0.0.0.0/0            /* generated for LXD network lxdbr0 */

Chain OUTPUT (policy ACCEPT 928 packets, 89610 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     tcp  --  *      lxdbr0  0.0.0.0/0            0.0.0.0/0            tcp spt:53 /* generated for LXD network lxdbr0 */
    0     0 ACCEPT     udp  --  *      lxdbr0  0.0.0.0/0            0.0.0.0/0            udp spt:53 /* generated for LXD network lxdbr0 */
 1705  587K ACCEPT     udp  --  *      lxdbr0  0.0.0.0/0            0.0.0.0/0            udp spt:67 /* generated for LXD network lxdbr0 */
10390  861K DROP-UDP   udp  --  *      *       0.0.0.0/0            0.0.0.0/0           
   54 11598 DROP-SSH   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22
    0     0 DROP-SMTP  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:25

Chain DROP-SMTP (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  *      *       0.0.0.0/0            144.217.145.114      /* davidfavor.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            68.233.248.187       /* net1.bizcooker.com */
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 7 prefix "DROP-SMTP: "
    0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Chain DROP-SSH (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  *      *       0.0.0.0/0            192.99.135.208       /* cydec.com */
   52 11518 RETURN     all  --  *      *       0.0.0.0/0            192.99.135.209       /* newswire.net */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            167.114.29.142       /* 167.114.29.142 */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            167.114.29.139       /* 167.114.29.139 */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            167.114.29.138       /* 167.114.29.138 */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            167.114.29.137       /* 167.114.29.137 */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            51.79.18.222         /* net16.faststablehosting.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            51.79.16.116         /* net15.faststablehosting.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            51.79.19.215         /* net14.faststablehosting.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            54.39.133.240        /* net13.faststablehosting.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            51.79.59.90          /* net12.wpfastsites.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            51.79.113.179        /* net11.wpfastsites.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            144.217.145.112      /* net10.wpfastsites.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            167.114.159.29       /* net8.bizcooker.com */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            192.30.252.0/22      /* github.com */
    2    80 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 7 prefix "DROP-SSH: "
    2    80 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Chain DROP-UDP (1 references)
 pkts bytes target     prot opt in     out     source               destination         
 1553  204K RETURN     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spt:53
    0     0 RETURN     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spts:67:68 dpts:67:68
 1560  110K RETURN     udp  --  *      *       127.0.0.1            127.0.0.1           
    0     0 RETURN     udp  --  *      *       10.0.3.1             0.0.0.0/0           
 7273  545K RETURN     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 53,68,123
    4  1112 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 7 prefix "DROP-UDP: "
    4  1112 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Chain f2b-sshd (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   31  1772 REJECT     all  --  *      *       49.88.112.75         0.0.0.0/0            reject-with icmp-port-unreachable
 340K   36M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0     
davidfavor commented 4 years ago

Arg...

Just lxc delete on the container this morning succeeded.

Regards the iptables data.

Expand a bit on anything I can look for in iptables output which might provide a workaround when this intermittent problem occurs again.

Thanks.

tomponline commented 4 years ago

@davidfavor so the original error was:

write udp 10.101.55.1:36820->10.101.55.1:67: write: operation not permitted

That error is generated by LXD trying to send a DHCP release packet to the dnsmasq process that should be listening on 10.101.55.1 to release the lease.

I was wondering if there are any iptables or ebtables rules that may be blocking those packets.

Also, I'm not sure if its related, but I've noticed an oddity in your container config. You have 2 network interfaces connected to the same bridge:

  eno1:
    nictype: bridged
    parent: lxdbr0
    type: nic
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic

Is there a reason for this? As its likely going to cause unexpected routing and firewall issues.

ateska commented 2 years ago

My two cents as I encountered this issue as well.

It was resolved by removal of following line from nftables:

add inet filter output oifname "lo" ip daddr != 127.0.0.0/8 drop

It essentially means that "something" here is trying to communicate from lo network interface with source ip address being from lxdbr0.

tomponline commented 2 years ago

Ah yes that would make sense as LXD will send a dhcp release packet to the bridge from the instance NICs Mac address so that dnsmasq removes the lease without needing to restart dnsmasq.