Closed respadas closed 3 years ago
Is that a cluster?
I'm assuming you already tried systemctl restart lxd
?
Hi,
is not a cluster and yes I runned the restart but the commando sticks, If I check the status have this output:
root@LXD-nodo1:~# systemctl status lxd ● lxd.service - LXD - main daemon Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled) Active: activating (start-post) since Fri 2021-02-26 14:23:49 CST; 47s ago Docs: man:lxd(1) Process: 989 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS) Main PID: 999 (lxd); Control PID: 1000 (lxd) Tasks: 38 CGroup: /system.slice/lxd.service ├─ 785 [lxc monitor] /var/lib/lxd/containers ├─ 999 /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log ├─ 1000 /usr/lib/lxd/lxd waitready --timeout=600 ├─ 1508 [lxc monitor] /var/lib/lxd/containers ├─ 2931 [lxc monitor] /var/lib/lxd/containers ├─ 3691 [lxc monitor] /var/lib/lxd/containers ├─ 7871 [lxc monitor] /var/lib/lxd/containers ├─10423 [lxc monitor] /var/lib/lxd/containers ├─11942 [lxc monitor] /var/lib/lxd/containers ├─15347 [lxc monitor] /var/lib/lxd/containers ├─18179 [lxc monitor] /var/lib/lxd/containers ├─19380 [lxc monitor] /var/lib/lxd/containers ├─19626 [lxc monitor] /var/lib/lxd/containers ├─19695 [lxc monitor] /var/lib/lxd ├─19732 dnsmasq --strict-order --bind-interfaces --pid-file=/var/lib/lxd/networks/lxdfan0/dnsmasq.pid --except-interface=lo --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address ├─25238 [lxc monitor] /var/lib/lxd/containers ├─28249 [lxc monitor] /var/lib/lxd/containers ├─30026 [lxc monitor] /var/lib/lxd/containers ├─32200 [lxc monitor] /var/lib/lxd/containers └─32552 [lxc monitor] /var/lib/lxd/containers
Feb 26 14:23:49 LXD-nodo1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 14:23:49 LXD-nodo1 systemd[1]: lxd.service: Found left-over process 28249 (lxd) in control group while starting unit. Ignoring. Feb 26 14:23:49 LXD-nodo1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 14:23:49 LXD-nodo1 systemd[1]: lxd.service: Found left-over process 10423 (lxd) in control group while starting unit. Ignoring. Feb 26 14:23:49 LXD-nodo1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 14:23:49 LXD-nodo1 systemd[1]: lxd.service: Found left-over process 19626 (lxd) in control group while starting unit. Ignoring. Feb 26 14:23:49 LXD-nodo1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 14:23:49 LXD-nodo1 systemd[1]: lxd.service: Found left-over process 25238 (lxd) in control group while starting unit. Ignoring. Feb 26 14:23:49 LXD-nodo1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 14:23:49 LXD-nodo1 lxd[999]: t=2021-02-26T14:23:49-0600 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."
Can you show ps fauxww
and sqlite3 /var/lib/lxd/database/local.db "SELECT * FROM raft_nodes;"
?
Sure!
ps_output.log root@LXD-nodo1:~# sqlite3 /var/lib/lxd/database/local.db "SELECT * FROM raft_nodes;" 1|10.0.2.4:8443
Right, so your system is setup as a cluster, a one node cluster but still a cluster. Is your machine actually reachable at 10.0.2.4?
Hi, yes it's reachable.
What does nc -v 10.0.2.4 8443
get you?
root@LXD-nodo1:~# nc -v 10.0.2.4 8443 Connection to 10.0.2.4 8443 port [tcp/*] succeeded!
Okay, that's odd.
Can you do:
See what that get stuck on?
root@LXD-nodo1:~# sudo lxd --debug --group lxd DBUG[02-26|18:21:29] Connecting to a local LXD over a Unix socket DBUG[02-26|18:21:29] Sending request to LXD method=GET url=http://unix.socket/1.0 etag= INFO[02-26|18:21:29] LXD 3.0.3 is starting in normal mode path=/var/lib/lxd INFO[02-26|18:21:29] Kernel uid/gid map: INFO[02-26|18:21:29] - u 0 0 4294967295 INFO[02-26|18:21:29] - g 0 0 4294967295 INFO[02-26|18:21:29] Configured LXD uid/gid map: INFO[02-26|18:21:29] - u 0 100000 65536 INFO[02-26|18:21:29] - g 0 100000 65536 WARN[02-26|18:21:29] CGroup memory swap accounting is disabled, swap limits will be ignored. INFO[02-26|18:21:29] Kernel features: INFO[02-26|18:21:29] - netnsid-based network retrieval: yes INFO[02-26|18:21:29] - unprivileged file capabilities: yes INFO[02-26|18:21:29] Initializing local database DBUG[02-26|18:21:29] Initializing database gateway DBUG[02-26|18:21:29] Connecting to a local LXD over a Unix socket DBUG[02-26|18:21:29] Sending request to LXD method=GET url=http://unix.socket/1.0 etag= DBUG[02-26|18:21:29] Detected stale unix socket, deleting DBUG[02-26|18:21:29] Detected stale unix socket, deleting INFO[02-26|18:21:29] Starting /dev/lxd handler: INFO[02-26|18:21:29] - binding devlxd socket socket=/var/lib/lxd/devlxd/sock INFO[02-26|18:21:29] REST API daemon: INFO[02-26|18:21:29] - binding Unix socket socket=/var/lib/lxd/unix.socket INFO[02-26|18:21:29] - binding TCP socket socket=[::]:8443 INFO[02-26|18:21:29] Initializing global database DBUG[02-26|18:21:29] Found cert k=0 DBUG[02-26|18:21:29] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=0 DBUG[02-26|18:21:29] Dqlite: connection failed err=no available dqlite leader server found attempt=0 DBUG[02-26|18:21:29] Found cert k=0 DBUG[02-26|18:21:29] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=1 DBUG[02-26|18:21:29] Dqlite: connection failed err=no available dqlite leader server found attempt=1 DBUG[02-26|18:21:30] Found cert k=0 DBUG[02-26|18:21:30] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=2 DBUG[02-26|18:21:30] Dqlite: connection failed err=no available dqlite leader server found attempt=2 DBUG[02-26|18:21:30] Found cert k=0 DBUG[02-26|18:21:30] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=3 DBUG[02-26|18:21:30] Dqlite: connection failed err=no available dqlite leader server found attempt=3 DBUG[02-26|18:21:31] Found cert k=0 DBUG[02-26|18:21:31] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=4 DBUG[02-26|18:21:31] Dqlite: connection failed err=no available dqlite leader server found attempt=4 DBUG[02-26|18:21:32] Found cert k=0 DBUG[02-26|18:21:32] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=5 DBUG[02-26|18:21:32] Dqlite: connection failed err=no available dqlite leader server found attempt=5 DBUG[02-26|18:21:33] Found cert k=0 DBUG[02-26|18:21:33] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=6 DBUG[02-26|18:21:33] Dqlite: connection failed err=no available dqlite leader server found attempt=6 DBUG[02-26|18:21:34] Found cert k=0 DBUG[02-26|18:21:34] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=7 DBUG[02-26|18:21:34] Dqlite: connection failed err=no available dqlite leader server found attempt=7 DBUG[02-26|18:21:35] Found cert k=0 DBUG[02-26|18:21:35] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=8 DBUG[02-26|18:21:35] Dqlite: connection failed err=no available dqlite leader server found attempt=8 DBUG[02-26|18:21:36] Found cert k=0 DBUG[02-26|18:21:36] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=9 DBUG[02-26|18:21:36] Dqlite: connection failed err=no available dqlite leader server found attempt=9 DBUG[02-26|18:21:37] Found cert k=0 DBUG[02-26|18:21:37] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=10 DBUG[02-26|18:21:37] Dqlite: connection failed err=no available dqlite leader server found attempt=10 DBUG[02-26|18:21:38] Found cert k=0 DBUG[02-26|18:21:38] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=11 DBUG[02-26|18:21:38] Dqlite: connection failed err=no available dqlite leader server found attempt=11 DBUG[02-26|18:21:39] Failed connecting to global database (attempt 0): failed to create dqlite connection: no available dqlite leader server found DBUG[02-26|18:21:41] Found cert k=0 DBUG[02-26|18:21:41] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=0 DBUG[02-26|18:21:41] Dqlite: connection failed err=no available dqlite leader server found attempt=0 DBUG[02-26|18:21:42] Found cert k=0 DBUG[02-26|18:21:42] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=1 DBUG[02-26|18:21:42] Dqlite: connection failed err=no available dqlite leader server found attempt=1 DBUG[02-26|18:21:42] Found cert k=0 DBUG[02-26|18:21:42] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=2 DBUG[02-26|18:21:42] Dqlite: connection failed err=no available dqlite leader server found attempt=2 DBUG[02-26|18:21:42] Found cert k=0 DBUG[02-26|18:21:42] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=3 DBUG[02-26|18:21:42] Dqlite: connection failed err=no available dqlite leader server found attempt=3 DBUG[02-26|18:21:43] Found cert k=0 DBUG[02-26|18:21:43] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=4 DBUG[02-26|18:21:43] Dqlite: connection failed err=no available dqlite leader server found attempt=4 DBUG[02-26|18:21:44] Found cert k=0 DBUG[02-26|18:21:44] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=10.0.2.4:8443 attempt=5 DBUG[02-26|18:21:44] Dqlite: connection failed err=no available dqlite leader server found attempt=5
Just to be sure there's nothing weird going on, what does ip -4 route get 10.0.2.4
get you?
root@LXD-nodo1:~# ip -4 route get 10.0.2.4
local 10.0.2.4 dev lo src 10.0.2.4 uid 0
cache
Can you send me a tarball of /var/lib/lxd/database
at stgraber at ubuntu dot com?
I'll try to reproduce the issue here and get it back online.
It'd have been pretty trivial to resolve this on LXD 4.0 but 3.0 is missing much of the newer clustering tooling.
Files send, thank you.
Sent you a manually re-created version of your database with clustering disabled. It's loading fine here and I can see your containers (24 of them, using btrfs storage pool).
Thank you a lot, the re-created database is working fine.
Regards,
Required information
Issue description
The system is uptime since september 11 and it's a production server, all worked fine until today. We can't use lxc for nothing, always ends with Error: Get http://unix.socket/1.0: EOF there's not exist new changes with configurations, the journalctl output has basically three messages:
Feb 26 13:08:22 LXD-nodo1 systemd[1]: lxd.service: Found left-over process 25238 (lxd) in control group while starting unit. Ignoring. Feb 26 13:08:22 LXD-nodo1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Feb 26 13:08:22 LXD-nodo1 lxd[22448]: t=2021-02-26T13:08:22-0600 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." Feb 26 13:09:48 LXD-nodo1 lxd[22448]: t=2021-02-26T13:09:48-0600 lvl=warn msg="Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found" Feb 26 13:10:01 LXD-nodo1 lxd[22448]: t=2021-02-26T13:10:01-0600 lvl=warn msg="Failed connecting to global database (attempt 7): failed to create dqlite connection: no available dqlite leader server found"
Steps to reproduce
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)