Closed Kramerican closed 6 years ago
Ok, I wonder if the snap is trying to still spawn LXD in the background, messing with the manually started one, explaining the socket issue.
Can you:
That should have it start again, likely still taking a couple minutes at the database stage, wait for it to start for real and see if the socket works any better now.
No lxd.db is normal for LXD 3.0
Same behavior.
net10 # systemctl stop snap.lxd.daemon
net10 # echo $?
0
net10 # rm -fv /var/snap/lxd/common/lxd/unix.socket
net10 # lxd --debug --group lxd
INFO[05-01|15:03:22] LXD 3.0.0 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[05-01|15:03:22] Kernel uid/gid map:
INFO[05-01|15:03:22] - u 0 0 4294967295
INFO[05-01|15:03:22] - g 0 0 4294967295
INFO[05-01|15:03:22] Configured LXD uid/gid map:
INFO[05-01|15:03:22] - u 0 1000000 1000000000
INFO[05-01|15:03:22] - g 0 1000000 1000000000
WARN[05-01|15:03:22] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[05-01|15:03:22] Initializing local database
INFO[05-01|15:03:22] Initializing database gateway
INFO[05-01|15:03:22] Start database node address= id=1
INFO[05-01|15:03:22] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[05-01|15:03:22] Raft: Node at 0 [Leader] entering Leader state
INFO[05-01|15:03:22] LXD isn't socket activated
INFO[05-01|15:03:22] Starting /dev/lxd handler:
INFO[05-01|15:03:22] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:03:22] REST API daemon:
INFO[05-01|15:03:22] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:03:22] - binding TCP socket socket=[::]:8443
INFO[05-01|15:03:22] Initializing global database
DBUG[05-01|15:04:22] Database error: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
EROR[05-01|15:04:22] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
INFO[05-01|15:04:22] Starting shutdown sequence
INFO[05-01|15:04:22] Stopping REST API handler:
INFO[05-01|15:04:22] - closing socket socket=[::]:8443
INFO[05-01|15:04:22] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:04:22] Stopping /dev/lxd handler
INFO[05-01|15:04:22] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:04:22] Stop database gateway
INFO[05-01|15:04:22] Stop raft instance
INFO[05-01|15:04:22] Stopping REST API handler:
INFO[05-01|15:04:22] Stopping /dev/lxd handler
INFO[05-01|15:04:22] Stopping REST API handler:
INFO[05-01|15:04:22] Stopping /dev/lxd handler
DBUG[05-01|15:04:22] Not unmounting temporary filesystems (containers are still running)
INFO[05-01|15:04:22] Saving simplestreams cache
INFO[05-01|15:04:22] Saved simplestreams cache
Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
net10 # ll /var/snap/lxd/common/lxd/unix.socket
/bin/ls: cannot access '/var/snap/lxd/common/lxd/unix.socket': No such file or directory
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S 09:28 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? R 09:28 0:01 \_ [kworker/0:0]
root 4 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/0:0H]
root 6 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [mm_percpu_wq]
root 7 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/0]
root 8 0.0 0.0 0 0 ? I 09:28 0:01 \_ [rcu_sched]
root 9 0.0 0.0 0 0 ? I 09:28 0:00 \_ [rcu_bh]
root 10 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/0]
root 11 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/0]
root 12 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/0]
root 13 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/1]
root 14 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/1]
root 15 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/1]
root 16 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/1]
root 18 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/1:0H]
root 19 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/2]
root 20 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/2]
root 21 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/2]
root 22 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/2]
root 24 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/2:0H]
root 25 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/3]
root 26 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/3]
root 27 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/3]
root 28 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/3]
root 30 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/3:0H]
root 31 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/4]
root 32 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/4]
root 33 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/4]
root 34 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/4]
root 36 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/4:0H]
root 37 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/5]
root 38 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/5]
root 39 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/5]
root 40 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/5]
root 42 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/5:0H]
root 43 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/6]
root 44 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/6]
root 45 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/6]
root 46 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/6]
root 48 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/6:0H]
root 49 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/7]
root 50 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/7]
root 51 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/7]
root 52 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/7]
root 54 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/7:0H]
root 55 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/8]
root 56 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/8]
root 57 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/8]
root 58 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/8]
root 59 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/8:0]
root 60 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/8:0H]
root 61 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/9]
root 62 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/9]
root 63 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/9]
root 64 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/9]
root 66 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/9:0H]
root 67 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/10]
root 68 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/10]
root 69 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/10]
root 70 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/10]
root 71 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/10:0]
root 72 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/10:0H]
root 73 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/11]
root 74 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/11]
root 75 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/11]
root 76 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/11]
root 78 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/11:0H]
root 79 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/12]
root 80 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/12]
root 81 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/12]
root 82 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/12]
root 84 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/12:0H]
root 85 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/13]
root 86 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/13]
root 87 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/13]
root 88 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/13]
root 90 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/13:0H]
root 91 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/14]
root 92 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/14]
root 93 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/14]
root 94 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/14]
root 95 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/14:0]
root 96 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/14:0H]
root 97 0.0 0.0 0 0 ? S 09:28 0:00 \_ [cpuhp/15]
root 98 0.0 0.0 0 0 ? S 09:28 0:00 \_ [watchdog/15]
root 99 0.0 0.0 0 0 ? S 09:28 0:00 \_ [migration/15]
root 100 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ksoftirqd/15]
root 102 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/15:0H]
root 103 0.0 0.0 0 0 ? S 09:28 0:00 \_ [kdevtmpfs]
root 104 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [netns]
root 105 0.0 0.0 0 0 ? S 09:28 0:00 \_ [rcu_tasks_kthre]
root 106 0.0 0.0 0 0 ? S 09:28 0:00 \_ [kauditd]
root 107 0.0 0.0 0 0 ? I 09:28 0:01 \_ [kworker/0:1]
root 108 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/1:1]
root 109 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/2:1]
root 110 0.0 0.0 0 0 ? S 09:28 0:00 \_ [khungtaskd]
root 111 0.0 0.0 0 0 ? S 09:28 0:00 \_ [oom_reaper]
root 112 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [writeback]
root 113 0.0 0.0 0 0 ? S 09:28 0:00 \_ [kcompactd0]
root 114 0.0 0.0 0 0 ? SN 09:28 0:00 \_ [ksmd]
root 115 0.0 0.0 0 0 ? SN 09:28 0:00 \_ [khugepaged]
root 116 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [crypto]
root 117 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kintegrityd]
root 118 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kblockd]
root 119 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [ata_sff]
root 120 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [md]
root 121 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [edac-poller]
root 122 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [devfreq_wq]
root 123 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [watchdogd]
root 125 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/3:1]
root 126 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/4:1]
root 127 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/5:1]
root 128 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/6:1]
root 129 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/7:1]
root 130 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/9:1]
root 131 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/10:1]
root 132 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/11:1]
root 133 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/8:1]
root 134 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/12:1]
root 135 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/14:1]
root 136 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/15:1]
root 137 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/13:1]
root 139 0.0 0.0 0 0 ? S 09:28 0:00 \_ [kswapd0]
root 140 0.0 0.0 0 0 ? S 09:28 0:00 \_ [ecryptfs-kthrea]
root 182 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kthrotld]
root 184 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [acpi_thermal_pm]
root 185 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/4:2]
root 187 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/1:2]
root 189 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/3:2]
root 190 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/15:2]
root 191 0.0 0.0 0 0 ? I 09:28 0:00 \_ [kworker/2:2]
root 195 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [ipv6_addrconf]
root 204 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kstrp]
root 221 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [charger_manager]
root 277 0.0 0.0 0 0 ? S 09:28 0:00 \_ [scsi_eh_0]
root 278 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [scsi_tmf_0]
root 279 0.0 0.0 0 0 ? S 09:28 0:00 \_ [scsi_eh_1]
root 280 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [scsi_tmf_1]
root 281 0.0 0.0 0 0 ? S 09:28 0:00 \_ [scsi_eh_2]
root 282 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [scsi_tmf_2]
root 283 0.0 0.0 0 0 ? S 09:28 0:00 \_ [scsi_eh_3]
root 284 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [scsi_tmf_3]
root 285 0.0 0.0 0 0 ? S 09:28 0:00 \_ [scsi_eh_4]
root 286 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [scsi_tmf_4]
root 287 0.0 0.0 0 0 ? S 09:28 0:00 \_ [scsi_eh_5]
root 288 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [scsi_tmf_5]
root 305 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/5:1H]
root 312 0.0 0.0 0 0 ? S 09:28 0:00 \_ [md4_raid1]
root 313 0.0 0.0 0 0 ? S 09:28 0:00 \_ [md2_raid1]
root 316 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/13:1H]
root 317 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/7:1H]
root 318 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [ixgbe]
root 319 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [ttm_swap]
root 321 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/9:1H]
root 322 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/1:1H]
root 323 0.0 0.0 0 0 ? I< 09:28 0:00 \_ [kworker/11:1H]
root 356 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [raid5wq]
root 402 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/6:1H]
root 406 0.1 0.0 0 0 ? S 09:30 0:02 \_ [jbd2/md4-8]
root 407 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [ext4-rsv-conver]
root 424 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/14:1H]
root 447 0.0 0.0 0 0 ? I< 09:30 0:01 \_ [kworker/12:1H]
root 467 0.0 0.0 0 0 ? I 09:30 0:00 \_ [kworker/9:2]
root 472 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/10:1H]
root 479 0.0 0.0 0 0 ? I 09:30 0:00 \_ [kworker/5:2]
root 566 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/8:1H]
root 640 0.0 0.0 0 0 ? SN 09:30 0:00 \_ [kipmi0]
root 642 0.0 0.0 0 0 ? S< 09:30 0:00 \_ [loop0]
root 643 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/2:1H]
root 645 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/0:1H]
root 648 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/15:1H]
root 649 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/4:1H]
root 656 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [kworker/3:1H]
root 738 0.0 0.0 0 0 ? S< 09:30 0:00 \_ [loop1]
root 740 0.0 0.0 0 0 ? S< 09:30 0:00 \_ [loop2]
root 766 0.0 0.0 0 0 ? S< 09:30 0:00 \_ [loop3]
root 788 0.0 0.0 0 0 ? S< 09:30 0:00 \_ [loop4]
root 814 0.0 0.0 0 0 ? S< 09:30 0:00 \_ [loop5]
root 840 0.0 0.0 0 0 ? I 09:30 0:00 \_ [kworker/7:2]
root 843 0.0 0.0 0 0 ? S 09:30 0:00 \_ [jbd2/md2-8]
root 844 0.0 0.0 0 0 ? I< 09:30 0:00 \_ [ext4-rsv-conver]
root 846 0.0 0.0 0 0 ? I 09:31 0:00 \_ [kworker/13:2]
root 6945 0.0 0.0 0 0 ? I 09:40 0:00 \_ [kworker/6:0]
root 7305 0.0 0.0 0 0 ? I 09:40 0:00 \_ [kworker/11:2]
root 7566 0.0 0.0 0 0 ? I 09:42 0:00 \_ [kworker/u32:1]
root 8797 0.0 0.0 0 0 ? I 09:50 0:00 \_ [kworker/u32:2]
root 9746 0.0 0.0 0 0 ? I 09:55 0:00 \_ [kworker/u32:3]
root 10138 0.0 0.0 0 0 ? I 09:58 0:00 \_ [kworker/12:0]
root 10889 0.0 0.0 0 0 ? I 10:04 0:00 \_ [kworker/u32:0]
root 1 2.9 0.0 78068 9144 ? Ss 09:28 1:06 /sbin/init noquiet nosplash
root 465 0.0 0.0 168504 48736 ? S<s 09:30 0:00 /lib/systemd/systemd-journald
root 469 0.0 0.0 97708 1832 ? Ss 09:30 0:00 /sbin/lvmetad -f
root 480 0.0 0.0 46848 5700 ? Ss 09:30 0:00 /lib/systemd/systemd-udevd
root 820 0.0 0.0 7488 2192 ? Ss 09:30 0:00 /sbin/mdadm --monitor --scan
systemd+ 823 0.0 0.0 71936 5868 ? Ss 09:30 0:00 /lib/systemd/systemd-networkd
systemd+ 858 0.0 0.0 70608 5312 ? Ss 09:31 0:00 /lib/systemd/systemd-resolved
systemd+ 859 0.0 0.0 141908 3264 ? Ssl 09:31 0:00 /lib/systemd/systemd-timesyncd
root 990 0.0 0.0 26720 5260 ? Ss 09:31 0:00 /usr/sbin/smartd -n
syslog 991 0.0 0.0 263032 4504 ? Ssl 09:31 0:00 /usr/sbin/rsyslogd -n
root 996 0.0 0.0 61996 5528 ? Ss 09:31 0:00 /lib/systemd/systemd-logind
root 998 0.0 0.0 287508 6660 ? Ssl 09:31 0:00 /usr/lib/accountsservice/accounts-daemon
message+ 1002 0.0 0.0 49928 4300 ? Ss 09:31 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 1012 0.0 0.0 9916 124 ? Ss 09:31 0:00 /usr/sbin/rngd -r /dev/hwrng
root 1015 0.0 0.0 110556 3484 ? Ssl 09:31 0:00 /usr/sbin/irqbalance --foreground
root 1017 0.0 0.0 31320 3200 ? Ss 09:31 0:00 /usr/sbin/cron -f
root 1018 0.0 0.0 170372 17048 ? Ssl 09:31 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher
root 1019 0.0 0.0 2096096 17768 ? Ssl 09:31 0:00 /usr/lib/snapd/snapd
root 1026 0.0 0.0 72296 6324 ? Ss 09:31 0:00 /usr/sbin/sshd -D
root 4521 0.0 0.0 74668 6496 ? Ss 09:36 0:00 \_ sshd: root@pts/1
root 4523 0.0 0.0 19944 4860 pts/1 Ss+ 09:36 0:00 | \_ -bash
root 8468 0.0 0.0 74668 6508 ? Ss 09:49 0:00 \_ sshd: root@pts/0
root 8470 0.0 0.0 19812 4672 pts/0 Ss 09:49 0:00 \_ -bash
root 11076 0.0 0.0 34712 3256 pts/0 R+ 10:05 0:00 \_ ps fauxww
root 1039 0.0 0.0 301024 20648 ? Ssl 09:31 0:01 /usr/bin/python3 /usr/bin/fail2ban-server -xf start
root 1059 0.1 0.0 29524 8292 ? S 09:31 0:02 perl /david-favor/tools/route-reviver
root 1190 0.0 0.0 15956 2268 ttyS1 Ss+ 09:31 0:00 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS1 vt220
root 1191 0.0 0.0 15956 2384 ttyS0 Ss+ 09:31 0:00 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
root 1194 0.0 0.0 16180 1992 tty1 Ss+ 09:31 0:01 /sbin/agetty -o -p -- \u --noclear tty1 linux
root 1211 0.0 0.0 160920 1236 ? Sl 09:31 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
nobody 8801 0.5 0.0 49984 404 ? S 09:50 0:04 dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.245.137.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.245.137.2,10.245.137.254,1h --listen-address=fd42:3e36:490c:fa3e::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u nobody
So it looks like all that's left to get fixed is to generate /var/snap/lxd/common/lxd/lxd.db + maybe all will be well.
no, you have a database, as I said, lxd.db isn't a thing anymore in LXD 3.0.
The issue is that the raft database isn't coming online here for some reason. That error suggests that the database couldn't be read within the timeout period.
Can you post find /var/snap/lxd/common/lxd/database
and du -sch /var/snap/lxd/common/lxd/database
?
Ah... so no more lxd.db anymore. Got it.
net10 # find /var/snap/lxd/common/lxd/database
/var/snap/lxd/common/lxd/database
/var/snap/lxd/common/lxd/database/global
/var/snap/lxd/common/lxd/database/global/db.bin-shm
/var/snap/lxd/common/lxd/database/global/logs.db
/var/snap/lxd/common/lxd/database/global/db.bin-wal
/var/snap/lxd/common/lxd/database/global/snapshots
/var/snap/lxd/common/lxd/database/global/db.bin
/var/snap/lxd/common/lxd/database/local.db
net10 # du -sch /var/snap/lxd/common/lxd/database
40M /var/snap/lxd/common/lxd/database
40M total
Ok, that's not particularly light but also not unusually large.
What version of the snap do you have? snap info lxd
Here's something curious...
If I do inotifywait -qmr /var/snap/lxd/common/lxd/database
Then I see /var/snap/lxd/common/lxd/database/global/ MODIFY db.bin-wal - spew from the point of "Initializing global database" till "Initializing storage pools".
So a massive number of writes which seems odd since local.db only contains 53 entries, so something is being written repeatedly into local.db many times.
Maybe this information helps.
net10 # snap info lxd
name: lxd
summary: System container manager and API
publisher: canonical
contact: https://github.com/lxc/lxd/issues
license: unknown
description: |
LXD is a container manager for system containers.
It offers a REST API to remotely manage containers over the network, using an image based workflow
and with support for live migration.
Images are available for all Ubuntu releases and architectures as well as for a wide number of other
Linux distributions.
LXD containers are lightweight, secure by default and a great alternative to virtual machines.
commands:
- lxd.benchmark
- lxd.check-kernel
- lxd.database
- lxd.lxc
- lxd
- lxd.migrate
services:
lxd.daemon: simple, enabled, inactive
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: stable
refreshed: 2018-04-28T17:49:22-05:00
installed: 3.0.0 (6862) 56MB -
channels:
stable: 3.0.0 (6879) 56MB -
candidate: 3.0.0 (6879) 56MB -
beta: ↑
edge: git-768e6ea (6891) 56MB -
2.0/stable: 2.0.11 (6627) 27MB -
2.0/candidate: 2.0.11 (6627) 27MB -
2.0/beta: ↑
2.0/edge: git-d71807e (6630) 25MB -
3.0/stable: 3.0.0 (6882) 56MB -
3.0/candidate: 3.0.0 (6882) 56MB -
3.0/beta: ↑
3.0/edge: git-69217a8 (6897) 56MB -
Ok, can you run snap refresh lxd
? We just published a new stable snap which hopefully will include some more debugging logic from @freeekanayaka
Done.
What does systemctl status snap.lxd.daemon
show now? there's a good chance the refresh is trying to start it back up
net10 # systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2018-05-01 10:24:31 CDT; 1min 7s ago
Process: 14433 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=1/FAILURE)
Main PID: 14433 (code=exited, status=1/FAILURE)
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 6.
May 01 10:24:31 net10.bizcooker.com systemd[1]: Stopped Service for snap application lxd.daemon.
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 01 10:24:31 net10.bizcooker.com systemd[1]: Failed to start Service for snap application lxd.daemon.
Ok, so it did try to start but failed again. Can you do:
We'll see if we get some more details with the newer snap.
Did a reboot first.
net10 # systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-05-01 10:33:05 CDT; 1min 40s ago
Main PID: 1036 (daemon.start)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/snap.lxd.daemon.service
‣ 1036 /bin/sh /snap/lxd/6879/commands/daemon.start
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 3: fd: 9: hugetlb
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 4: fd: 10: devices
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 5: fd: 11: freezer
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 6: fd: 12: rdma
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 7: fd: 13: cpu,cpuacct
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 8: fd: 14: net_cls,net_prio
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 9: fd: 15: blkio
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 10: fd: 16: cpuset
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 11: fd: 17: name=systemd
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]: 12: fd: 18: unified
net10 # systemctl stop snap.lxd.daemon
net10 # pgrep lxd
# empty - no output
net10 # sudo lxd --debug --group lxd
INFO[05-01|15:35:51] LXD 3.0.0 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[05-01|15:35:51] Kernel uid/gid map:
INFO[05-01|15:35:51] - u 0 0 4294967295
INFO[05-01|15:35:51] - g 0 0 4294967295
INFO[05-01|15:35:51] Configured LXD uid/gid map:
INFO[05-01|15:35:51] - u 0 1000000 1000000000
INFO[05-01|15:35:51] - g 0 1000000 1000000000
WARN[05-01|15:35:51] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[05-01|15:35:51] Initializing local database
INFO[05-01|15:35:51] Initializing database gateway
INFO[05-01|15:35:51] Start database node address= id=1
INFO[05-01|15:35:52] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[05-01|15:35:52] Raft: Node at 0 [Leader] entering Leader state
INFO[05-01|15:35:52] LXD isn't socket activated
INFO[05-01|15:35:52] Starting /dev/lxd handler:
INFO[05-01|15:35:52] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:35:52] REST API daemon:
INFO[05-01|15:35:52] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:35:52] - binding TCP socket socket=[::]:8443
INFO[05-01|15:35:52] Initializing global database
INFO[05-01|15:37:03] Initializing storage pools
DBUG[05-01|15:37:03] Initializing and checking storage pool "default".
DBUG[05-01|15:37:03] Initializing a DIR driver.
DBUG[05-01|15:37:03] Checking DIR storage pool "default".
DBUG[05-01|15:37:03] Initializing a DIR driver.
INFO[05-01|15:37:03] Initializing networks
DBUG[05-01|15:37:04] Connecting to a remote simplestreams server
INFO[05-01|15:37:04] Loading configuration
DBUG[05-01|15:37:04] Initialized inotify with file descriptor 15
INFO[05-01|15:37:04] Pruning expired images
INFO[05-01|15:37:04] Done pruning expired images
INFO[05-01|15:37:04] Updating instance types
INFO[05-01|15:37:04] Expiring log files
INFO[05-01|15:37:04] Updating images
INFO[05-01|15:37:04] Done expiring log files
DBUG[05-01|15:37:04] Processing image alias=18.04 fp=b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618 protocol=simplestreams server=https://cloud-images.ubuntu.com/releases
DBUG[05-01|15:37:04] Connecting to a remote simplestreams server
INFO[05-01|15:37:06] Done updating instance types
DBUG[05-01|15:37:06] Image already exists in the db image=b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618
DBUG[05-01|15:37:06] Already up to date fp=b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618
DBUG[05-01|15:37:06] Processing image alias=17.10 fp=f7febb8cbebc6aa8a993eb1ce534963a6b288fde23b9594bb3ba4560704dd65c protocol=simplestreams server=https://cloud-images.ubuntu.com/releases
DBUG[05-01|15:37:06] Using SimpleStreams cache entry expiry=2018-05-01T16:37:06+0000 server=https://cloud-images.ubuntu.com/releases
DBUG[05-01|15:37:06] Image already exists in the db image=f7febb8cbebc6aa8a993eb1ce534963a6b288fde23b9594bb3ba4560704dd65c
DBUG[05-01|15:37:06] Already up to date fp=f7febb8cbebc6aa8a993eb1ce534963a6b288fde23b9594bb3ba4560704dd65c
INFO[05-01|15:37:06] Done updating images
At this point lxc list works.
If I CNTL-C out of lxd process + restart, I get the same problem as before...
net10 # sudo lxd --debug --group lxd
INFO[05-01|15:38:21] LXD 3.0.0 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[05-01|15:38:21] Kernel uid/gid map:
INFO[05-01|15:38:21] - u 0 0 4294967295
INFO[05-01|15:38:21] - g 0 0 4294967295
INFO[05-01|15:38:21] Configured LXD uid/gid map:
INFO[05-01|15:38:21] - u 0 1000000 1000000000
INFO[05-01|15:38:21] - g 0 1000000 1000000000
WARN[05-01|15:38:21] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[05-01|15:38:21] Initializing local database
INFO[05-01|15:38:21] Initializing database gateway
INFO[05-01|15:38:21] Start database node address= id=1
INFO[05-01|15:38:21] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[05-01|15:38:21] Raft: Node at 0 [Leader] entering Leader state
INFO[05-01|15:38:21] LXD isn't socket activated
INFO[05-01|15:38:21] Starting /dev/lxd handler:
INFO[05-01|15:38:21] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:38:21] REST API daemon:
INFO[05-01|15:38:21] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:38:21] - binding TCP socket socket=[::]:8443
INFO[05-01|15:38:21] Initializing global database
DBUG[05-01|15:39:21] Database error: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
EROR[05-01|15:39:21] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
INFO[05-01|15:39:21] Starting shutdown sequence
INFO[05-01|15:39:21] Stopping REST API handler:
INFO[05-01|15:39:21] - closing socket socket=[::]:8443
INFO[05-01|15:39:21] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:39:21] Stopping /dev/lxd handler
INFO[05-01|15:39:21] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:39:21] Stop database gateway
INFO[05-01|15:39:21] Stop raft instance
INFO[05-01|15:39:21] Stopping REST API handler:
INFO[05-01|15:39:21] Stopping /dev/lxd handler
INFO[05-01|15:39:21] Stopping REST API handler:
INFO[05-01|15:39:21] Stopping /dev/lxd handler
DBUG[05-01|15:39:21] Not unmounting temporary filesystems (containers are still running)
INFO[05-01|15:39:21] Saving simplestreams cache
INFO[05-01|15:39:21] Saved simplestreams cache
Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
Ok, so there's clearly something wrong going on with that database, it doesn't make any sense that it takes two minutes to initialize...
Any chance I can have you send a tarball of /var/snap/lxd/common/lxd/database to stgraber@ubuntu.com?
Hopefully I can reproduce the issue here and then pester @freeekanayaka to figure out what's going on. This seems to match another report we had in #4485 which describes LXD hammering the disk for a minute or so and then failing to bring up the database...
I suspect you're very very close to the timeout somehow, so re-trying a bunch of times will likely eventually get it to start properly again.
but it makes no sense that it's taking that long to pull anything from the local database so we need to figure out what's happening.
Tarball sent.
Thanks for the tarball, startup only took 4s here so that's weird.
Just takes 4s or so to start here but I do see heavy disk activity so it may be similar to yours except I'm on a super fast nvme SSD.
What filesystem and type of drive do you have behind /var/snap?
Likely you're running on a machine with a fresh bionic install, rather than an upgrade.
Upstream Ubuntu reports the same thing... that is... everything works with a fresh install.
Best for you to contact me on Skype + look at my actual machine.
Just read your email about restarting lxd multiple times...
Sigh... This is tough as all containers are currently down right now.
Can't really have an entire machine down for days to work this bug.
I'll leave the machine as-is for today. If no progress is made, I'll likely do a fresh install of bionic + see if that clears up the problem.
When I say restarting lxd until it starts, I mean run "lxd --debug --group lxd" until you get past that database timeout. At that point you'll have your containers back until we get to debug more on our side.
your database does take long to load here but for my system, a long time is 4s. I need to know what kind of storage you're using on your server to reproduce the longer delay you see.
Ah... I currently have a "lxd --debug --group lxd" process running now + lxc list returns container listings.
My lxd bootstrap sequence...
lxd init --auto --storage-backend=dir
lxc network create lxdbr0
lxc network attach-profile lxdbr0 default eth0
So no ZFS or BTRFS issues.
Running strace on the lxd --debug process there are 1000s of lines of...
futex(0x17d20c8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
Once EAGAIN clears, lxd seems to get out of it's loop.
Maybe this helps.
I'm asking about your physical storage, are they ssd or hdd and what filesystem are you using for your system. The lxd storage config doesn't matter, I'm interested in what's under lxd.
Default OVH 2TB disk subsystem, so soft RAID.
net10 # cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/md4 / ext4 errors=remount-ro,noatime,dioread_nolock,delalloc,defaults 0 1
/dev/md2 /boot ext4 errors=remount-ro,noatime 0 1
/dev/sda3 swap swap defaults 0 0
/dev/sdb3 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs defaults 0 0
devtmpfs /dev devtmpfs rw 0 0
Same setup as with Artful. No changes after do-release-upgrade -d finished.
Ok, cool, I'll try on some slow spindles, see how long it takes there
Let me know if you have any other thoughts.
If you're stumped, I'll just do a bionic install from scratch.
Ok, I think we figured it out. We'll deal with this in #4485. We believe we have a temporary workaround and then will have a fixed LXD that will prevent this from happening again.
This machine runs many hot spares for clients, so I had to do a fresh bionic install.
With fresh install all problems fixed.
So I'd tentatively say Artful -> Bionic upgrades should be avoided.
How I got my system working.
1) lxc stop all containers
2) lxc copy all containers to another LXD machine
3) do fresh install of Bionic
4) lxc copy all containers back to machine
At this point all's well.
I know it is closed, but there seems to be a bug in netplan for anonymous bridges that the bridge doesnt come up after reboot.
https://bugs.launchpad.net/ubuntu/+source/nplan/+bug/1736975
Only after ip link set up br1 the container got the IP Address from the DHCP server. Big issue for LXD /KVM, etc using bridges, there is a workaround though.. since I have like 80 servers with LXD I hope they fix it.
Just to clarify, for posterity, my issue was simple (stupid) and unrelated to @davidfavor 's issue: My containers were not configured using the new Netplan configuration. At the time I simply didn't know about Netplan. As soon as we set up a netplan config file, everything worked for my 18.04 containers.
I see the same thing as @Kramerican sees.
This seems to be something related to machine level networking interacting with containers.
If I upgrade a non-netplan machine or netplan machine to Bionic, same behavior seems to occur.
Seems like a Bionic packaging problem.
I finally gave up, moved all containers to another machine, did a fresh Bionic install, moved all containers back.
At this point all containers worked.
Tried with LXD v2.21 on Ubuntu 16.04 and LXD v3.0.0 on 18.04 (system upgraded from 16.04)
Networking does not come up and container does not get an Ip assigned on my network bridge.
On both my 16.04 and 18.04 host system, a xenial image comes up just fine.
I have tried provisioning from
ubuntu:bionic
as well asimages:ubuntu/bionic/amd64
with identical results./var/log/syslog
on the host shows in all cases lines similar toThese lines are not present in syslog when provisioning other versions of Ubuntu (Xenial/Zesty). Interestingly upgrading an existing Xenial container to Bionic does not cause any networking issues.
Without knowing much about apparmor, I am assuming that the
DENIED ... networkd
line is an indicator of the culprit here. Any assistance would be much appreciated :)