Closed grawlinson closed 4 years ago
@grawlinson can you confirm you have the ebtables command installed?
You'll want to confirm some of the basics:
Also how is that LXD installed exactly? The journal output suggests that you upgraded to a new LXD and then downgraded it which LXD detected and refused starting.
I didn't need ebtables installed with lxd 3.18, only installed it with 3.19 when it showed up in the logs.
I don't have dnsmasq running due to all my containers having static IPs, and others obtaining IP addresses via router DHCP. It has worked this way from lxd 3.14, and only ran into this particular issue when upgrading to 3.19.
iptables -L -n -v
$ sudo iptables -L -n -v
Chain INPUT (policy ACCEPT 8189 packets, 697K bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 19 packets, 1434 bytes) pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 7592 packets, 3180K bytes) pkts bytes target prot opt in out source destination
LXD is installed using [this](https://github.com/Foxboron/lxd-repo/) repository. I upgraded from 3.18 to 3.19, back to 3.18 and now running 3.19.
I may have found a cause. Restarting a container results in this showing up in logs. Will do some more poking around.
Jan 21 08:19:51 r620 systemd-networkd[1172]: veth52060197: rtnl: received neighbor message with invalid family, ignoring.
Jan 21 08:19:51 r620 systemd-networkd[1172]: veth52060197: rtnl: received neighbor message with invalid family, ignoring.
Oh, hold on, did you also upgrade your containers? And are those running ArchLinux too?
If so and you're also using networkd in those containers, then you just got hit by the systemd pointrelease regression in networkd.
If that's the case, then your way out of this issue is to apply the same fix we have in our current images: https://github.com/lxc/lxc-ci/blob/master/images/archlinux.yaml#L470
Yup, that was the problem. Do you have a reference/article regarding the regression that I could peruse?
I really appreciate the help, @stgraber & @tomponline. I think that's two beverages I owe you now. :)
This change is what broke things. A proper fix will need a kernel change, until then we need this commit reverted by distros or the workaround be put in place.
Required information
Jan 19 17:24:59 r620 systemd[1]: Stopping REST API, command line tool and OpenStack integration plugin for LXC.... Jan 19 17:24:59 r620 systemd[1]: lxd.service: Succeeded. Jan 19 17:24:59 r620 systemd[1]: Stopped REST API, command line tool and OpenStack integration plugin for LXC.. Jan 19 17:24:59 r620 systemd[1]: Started REST API, command line tool and OpenStack integration plugin for LXC.. Jan 19 17:24:59 r620 lxd[3208709]: t=2020-01-19T17:24:59+1300 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored" Jan 19 17:26:56 r620 lxd[3208709]: t=2020-01-19T17:26:56+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:26:56 r620 lxd[3208709]: t=2020-01-19T17:26:56+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:26:56 r620 lxd[3208709]: t=2020-01-19T17:26:56+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:27:06 r620 lxd[3208709]: t=2020-01-19T17:27:06+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:30:11 r620 lxd[3208709]: t=2020-01-19T17:30:11+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:12 r620 systemd[1]: Stopping LXD - main daemon... Jan 19 17:34:12 r620 lxd[3208709]: t=2020-01-19T17:34:12+1300 lvl=warn msg="Detected poll(POLLNVAL) event." Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:14 r620 lxd[3208709]: t=2020-01-19T17:34:14+1300 lvl=eror msg="Failed to remove nic filters: Failed to remove network filters for eth0: Failed to run: ebtables --concurrent -L --Lmac2 --Lx: " Jan 19 17:34:18 r620 systemd[1]: lxd.service: Succeeded. Jan 19 17:34:18 r620 systemd[1]: Stopped LXD - main daemon. Jan 19 17:34:18 r620 systemd[1]: Starting LXD - main daemon... Jan 19 17:34:18 r620 lxd[3234399]: t=2020-01-19T17:34:18+1300 lvl=warn msg="Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored." Jan 19 17:34:19 r620 lxd[3234399]: t=2020-01-19T17:34:19+1300 lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: schema version '23' is more recent than expected '18'" Jan 19 17:34:19 r620 lxd[3234399]: Error: failed to open cluster database: failed to ensure schema: schema version '23' is more recent than expected '18' Jan 19 17:34:19 r620 systemd[1]: lxd.service: Main process exited, code=exited, status=1/FAILURE Jan 19 17:44:18 r620 lxd[3234400]: Error: LXD still not running after 600s timeout (Get http://unix.socket/1.0: EOF) Jan 19 17:44:18 r620 systemd[1]: lxd.service: Control process exited, code=exited, status=1/FAILURE Jan 19 17:44:18 r620 systemd[1]: lxd.service: Failed with result 'exit-code'. Jan 19 17:44:18 r620 systemd[1]: Failed to start LXD - main daemon. Jan 19 17:44:18 r620 systemd[1]: lxd.service: Scheduled restart job, restart counter is at 1.
$ lxc info ns1 --show-log Name: ns1 Location: none Remote: unix:// Architecture: x86_64 Created: 2020/01/19 06:18 UTC Status: Running Type: container Profiles: vlan40, limit Pid: 26812 Ips: eth0: inet6 fe80::220:91ff:fe00:4004 vethe68e899c lo: inet 127.0.0.1 lo: inet6 ::1 Resources: Processes: 93 Disk usage: root: 198.87MB CPU usage: CPU usage (in seconds): 3 Memory usage: Memory (current): 133.46MB Network usage: eth0: Bytes received: 8.48kB Bytes sent: 1.29kB Packets received: 126 Packets sent: 16 lo: Bytes received: 78B Bytes sent: 78B Packets received: 2 Packets sent: 2
Log:
lxc ns1 20200120083531.117 WARN cgfsng - cgroups/cgfsng.c:chowmod:1527 - No such file or directory - Failed to chown(/sys/fs/cgroup/unified//lxc.payload/ns1/memory.oom.group, 65536, 0)
$ lxc config show ns1 --expanded architecture: x86_64 config: image.architecture: x86_64 image.description: base deploy image image.name: archlinux-2019.12.01-x86_64-default-20191210_0310 image.os: archlinux image.release: 2019.12.01 image.serial: "20191210_0310" image.variant: default limits.cpu: "1" limits.memory: 512MB volatile.base_image: cb4fe4d26007ee3aa9532b478202d00c120b08d8b9ee93c2ef09d14fc4dc1d15 volatile.eth0.host_name: vethe68e899c volatile.eth0.hwaddr: 00:20:91:00:40:04 volatile.idmap.base: "0" volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: RUNNING devices: eth0: name: eth0 nictype: bridged parent: br-vlan40 type: nic root: path: / pool: zfs-zetta type: disk ephemeral: false profiles:
$ lxc monitor location: none metadata: context: {} level: dbug message: 'New event listener: 9bb40b09-fcfc-4560-9925-195bf6be77a3' timestamp: "2020-01-20T21:45:18.420813594+13:00" type: logging
location: none metadata: context: ip: '@' method: GET url: /1.0 user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:34.215238556+13:00" type: logging
location: none metadata: context: ip: '@' method: GET url: /1.0/events user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:34.220079873+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'New event listener: 0b094a4e-9990-463b-8508-0db1039e5e53' timestamp: "2020-01-20T21:45:34.220485048+13:00" type: logging
location: none metadata: context: ip: '@' method: PUT url: /1.0/instances/ns1/state user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:34.221016701+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'New task Operation: 4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae' timestamp: "2020-01-20T21:45:34.230718469+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Started task operation: 4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae' timestamp: "2020-01-20T21:45:34.231185402+13:00" type: logging
location: none metadata: class: task created_at: "2020-01-20T21:45:34.226029519+13:00" description: Restarting container err: "" id: 4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae location: none may_cancel: false metadata: null resources: containers:
location: none metadata: class: task created_at: "2020-01-20T21:45:34.226029519+13:00" description: Restarting container err: "" id: 4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae location: none may_cancel: false metadata: null resources: containers:
location: none metadata: context: action: shutdown created: 2020-01-19 19:18:59.929131886 +1300 NZDT ephemeral: "false" name: ns1 project: default timeout: -1s used: 2020-01-20 21:43:57.825480063 +1300 NZDT level: info message: Shutting down container timestamp: "2020-01-20T21:45:34.232054975+13:00" type: logging
location: none metadata: context: ip: '@' method: GET url: /1.0/operations/4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:34.233865875+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Scheduler: network: eth0 has been added: updating network priorities' timestamp: "2020-01-20T21:45:34.729395819+13:00" type: logging
location: none metadata: context: ip: '@' method: GET url: /internal/containers/59/onstopns?target=stop&netns=/proc/33113/fd/4 user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:34.789692489+13:00" type: logging
location: none metadata: context: ip: '@' method: GET url: /internal/containers/59/onstop?target=stop user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:35.581790144+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: UnmountInstance started timestamp: "2020-01-20T21:45:35.589049795+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: UnmountInstance finished timestamp: "2020-01-20T21:45:36.701751617+13:00" type: logging
location: none metadata: context: action: shutdown created: 2020-01-19 19:18:59.929131886 +1300 NZDT ephemeral: "false" name: ns1 project: default timeout: -1s used: 2020-01-20 21:43:57.825480063 +1300 NZDT level: info message: Shut down container timestamp: "2020-01-20T21:45:36.928250564+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Scheduler: container ns1 stopped: re-balancing' timestamp: "2020-01-20T21:45:36.928296237+13:00" type: logging
location: none metadata: action: container-shutdown source: /1.0/containers/ns1 timestamp: "2020-01-20T21:45:36.928258395+13:00" type: lifecycle
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: MountInstance started timestamp: "2020-01-20T21:45:36.950761824+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Scheduler: network: vethea2222ee has been added: updating network priorities' timestamp: "2020-01-20T21:45:36.983636689+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: MountInstance finished timestamp: "2020-01-20T21:45:36.98584044+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: UpdateInstanceBackupFile started timestamp: "2020-01-20T21:45:36.986379696+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Scheduler: network: veth17fa5ab6 has been added: updating network priorities' timestamp: "2020-01-20T21:45:36.995106605+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: UpdateInstanceBackupFile finished timestamp: "2020-01-20T21:45:36.995416367+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: MountInstance started timestamp: "2020-01-20T21:45:36.99566445+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: MountInstance finished timestamp: "2020-01-20T21:45:36.999972461+13:00" type: logging
location: none metadata: context: action: start created: 2020-01-19 19:18:59.929131886 +1300 NZDT ephemeral: "false" name: ns1 project: default stateful: "false" used: 2020-01-20 21:43:57.825480063 +1300 NZDT level: info message: Starting container timestamp: "2020-01-20T21:45:37.000007677+13:00" type: logging
location: none metadata: context: ip: '@' method: GET url: /internal/containers/59/onstart user: "" level: dbug message: Handling timestamp: "2020-01-20T21:45:37.04782919+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: MountInstance started timestamp: "2020-01-20T21:45:37.051869394+13:00" type: logging
location: none metadata: context: driver: zfs instance: ns1 pool: zfs-zetta project: default level: dbug message: MountInstance finished timestamp: "2020-01-20T21:45:37.053302694+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Scheduler: container ns1 started: re-balancing' timestamp: "2020-01-20T21:45:37.063202693+13:00" type: logging
location: none metadata: action: container-started source: /1.0/containers/ns1 timestamp: "2020-01-20T21:45:37.329469254+13:00" type: lifecycle
location: none metadata: context: action: start created: 2020-01-19 19:18:59.929131886 +1300 NZDT ephemeral: "false" name: ns1 project: default stateful: "false" used: 2020-01-20 21:43:57.825480063 +1300 NZDT level: info message: Started container timestamp: "2020-01-20T21:45:37.329448597+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Success for task operation: 4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae' timestamp: "2020-01-20T21:45:37.329499518+13:00" type: logging
location: none metadata: class: task created_at: "2020-01-20T21:45:34.226029519+13:00" description: Restarting container err: "" id: 4bf7b3d0-ba9d-4a2f-871f-bfc2826a1cae location: none may_cancel: false metadata: null resources: containers:
location: none metadata: context: {} level: dbug message: 'Event listener finished: 0b094a4e-9990-463b-8508-0db1039e5e53' timestamp: "2020-01-20T21:45:37.331990754+13:00" type: logging
location: none metadata: context: {} level: dbug message: 'Disconnected event listener: 0b094a4e-9990-463b-8508-0db1039e5e53' timestamp: "2020-01-20T21:45:37.332517774+13:00" type: logging