Closed rigred closed 5 years ago
Hmm, I don't get why starting it directly with lxd --group lxd
would work but through systemd wouldn't, normally both should fail if it's not a random issue which isn't persistent.
@freeekanayaka
@stgraber I presume that lxd --group lxd
is using /var/lib/lxd
vs /var/snap/lxd/common/xd
?
@rigred can you send us the /var/snap/lxd/common/lxd/logs/lxd.log
too?
I don't think it's using /var/lib/lxd as nothing's there that's recently been touched.
ls -lah /var/lib/lxd
drwxr-xr-x 4 lxd nogroup 5 Jul 31 12:00 .
drwxr-xr-x 33 root root 33 Jul 31 12:13 ..
drwxr-xr-x 2 root root 2 Aug 27 2018 devlxd
-rw-r--r-- 1 root root 1.9K Aug 27 2018 server.crt
drwx--x--x 2 root root 2 Aug 27 2018 shmounts
And based on journalctl data it's definetly running out of /var/snap/lxd/common/lxd/
The log you asked for is quite limited, I've replaced the public IP with IP_ADDR.
t=2019-08-01T20:27:29+0200 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2019-08-01T20:27:29+0200 lvl=info msg="Kernel uid/gid map:"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - u 0 0 4294967295"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - g 0 0 4294967295"
t=2019-08-01T20:27:29+0200 lvl=info msg="Configured LXD uid/gid map:"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - u 0 1000000 1000000000"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - g 0 1000000 1000000000"
t=2019-08-01T20:27:29+0200 lvl=info msg="Kernel features:"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - netnsid-based network retrieval: no"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - uevent injection: no"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - seccomp listener: no"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - shiftfs support: no"
t=2019-08-01T20:27:29+0200 lvl=info msg="Initializing local database"
t=2019-08-01T20:27:29+0200 lvl=info msg="Starting /dev/lxd handler:"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2019-08-01T20:27:29+0200 lvl=info msg="REST API daemon:"
t=2019-08-01T20:27:29+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-08-01T20:27:29+0200 lvl=info msg=" - binding TCP socket" socket=IP_ADDR:8443
t=2019-08-01T20:27:29+0200 lvl=info msg="Initializing global database"
t=2019-08-01T20:27:36+0200 lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: database is locked"
t=2019-08-01T20:27:36+0200 lvl=info msg="Starting shutdown sequence"
t=2019-08-01T20:27:36+0200 lvl=info msg="Stopping REST API handler:"
t=2019-08-01T20:27:36+0200 lvl=info msg=" - closing socket" socket=IP_ADDR:8443
t=2019-08-01T20:27:36+0200 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-08-01T20:27:36+0200 lvl=info msg="Stopping /dev/lxd handler:"
t=2019-08-01T20:27:36+0200 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
Where can I email database info instead of here?
I can send you the full output of lxd --group lxd
that prints A LOT of info.
Just for clarity this system doesn't have any apt lxd/lxc/lxcfs remnants. lxd points to /snap/bin/lxd
You can email the database to me at free dot ekanayaka at canonical dot com.
One possibility is that the lxd daemon is racing with something else (perhaps lxd wait) when started using systemd/snapd. Would it be possible for you to make sure that absolutely no lxd-related process is running and then try again to start the systemd unit? systemctl start snap.lxd.daemon
So I have figured out what some of the problem was.
The nodes
table in db.bin for some reason had an entry that was pointing at the public IP of the server instead of 0.0.0.0 as it does on a good local install.
This apparently confused the ?newly? added cluster check.
Similarly raft_nodes in local.db contained an entry for the systems public IP with port 8443. So what I did
0.0.0.0
and none for name.However I now have a new issue of the following type
t=2019-08-06T13:30:51+0200 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2019-08-06T13:30:51+0200 lvl=info msg="Kernel uid/gid map:"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - u 0 0 4294967295"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - g 0 0 4294967295"
t=2019-08-06T13:30:51+0200 lvl=info msg="Configured LXD uid/gid map:"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - u 0 1000000 1000000000"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - g 0 1000000 1000000000"
t=2019-08-06T13:30:51+0200 lvl=info msg="Kernel features:"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - uevent injection: yes"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - seccomp listener: yes"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - shiftfs support: no"
t=2019-08-06T13:30:51+0200 lvl=info msg="Initializing local database"
t=2019-08-06T13:30:51+0200 lvl=info msg="Starting /dev/lxd handler:"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2019-08-06T13:30:51+0200 lvl=info msg="REST API daemon:"
t=2019-08-06T13:30:51+0200 lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-08-06T13:30:51+0200 lvl=info msg=" - binding TCP socket" socket=[::]:8443
t=2019-08-06T13:30:51+0200 lvl=info msg="Initializing global database"
t=2019-08-06T13:32:16+0200 lvl=warn msg="Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found"
---- Keeps repeating prior line over and over -----
The LXD daemon starts and listens on [::]8443 but apparently fails to connect to the the dqlite leader server which I presume is some other cluster remnant?
Running lxd --debug --group lxd
prints much the same info without extra clues.
Here's a dump file of the local.db
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE schema (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
version INTEGER NOT NULL,
updated_at DATETIME NOT NULL,
UNIQUE (version)
);
INSERT INTO schema VALUES(1,37,1535897343);
INSERT INTO schema VALUES(2,38,1565087616);
CREATE TABLE config (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
key VARCHAR(255) NOT NULL,
value TEXT,
UNIQUE (key)
);
INSERT INTO config VALUES(4,'core.https_address','[::]:8443');
CREATE TABLE patches (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
name VARCHAR(255) NOT NULL,
applied_at DATETIME NOT NULL,
UNIQUE (name)
);
INSERT INTO patches VALUES(1,'invalid_profile_names',1535897343);
INSERT INTO patches VALUES(2,'leftover_profile_config',1535897343);
INSERT INTO patches VALUES(3,'network_permissions',1535897343);
INSERT INTO patches VALUES(4,'storage_api',1535897343);
INSERT INTO patches VALUES(5,'storage_api_v1',1535897343);
INSERT INTO patches VALUES(6,'storage_api_dir_cleanup',1535897343);
INSERT INTO patches VALUES(7,'storage_api_lvm_keys',1535897343);
INSERT INTO patches VALUES(8,'storage_api_keys',1535897343);
INSERT INTO patches VALUES(9,'storage_api_update_storage_configs',1535897343);
INSERT INTO patches VALUES(10,'storage_api_lxd_on_btrfs',1535897343);
INSERT INTO patches VALUES(11,'storage_api_lvm_detect_lv_size',1535897343);
INSERT INTO patches VALUES(12,'storage_api_insert_zfs_driver',1535897343);
INSERT INTO patches VALUES(13,'storage_zfs_noauto',1535897343);
INSERT INTO patches VALUES(14,'storage_zfs_volume_size',1535897343);
INSERT INTO patches VALUES(15,'network_dnsmasq_hosts',1535897343);
INSERT INTO patches VALUES(16,'storage_api_dir_bind_mount',1535897343);
INSERT INTO patches VALUES(17,'fix_uploaded_at',1535897343);
INSERT INTO patches VALUES(18,'storage_api_ceph_size_remove',1535897343);
INSERT INTO patches VALUES(19,'devices_new_naming_scheme',1535897343);
INSERT INTO patches VALUES(20,'storage_api_permissions',1535897343);
INSERT INTO patches VALUES(21,'container_config_regen',1535897343);
INSERT INTO patches VALUES(22,'lvm_node_specific_config_keys',1535897343);
INSERT INTO patches VALUES(23,'candid_rename_config_key',1535897343);
INSERT INTO patches VALUES(24,'move_backups',1537189625);
INSERT INTO patches VALUES(25,'storage_api_rename_container_snapshots_dir',1539569643);
INSERT INTO patches VALUES(26,'shrink_logs_db_file',1541542179);
INSERT INTO patches VALUES(27,'storage_api_rename_container_snapshots_links',1542063856);
CREATE TABLE raft_nodes (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
address TEXT NOT NULL,
UNIQUE (address)
);
DELETE FROM sqlite_sequence;
INSERT INTO sqlite_sequence VALUES('schema',2);
INSERT INTO sqlite_sequence VALUES('patches',27);
INSERT INTO sqlite_sequence VALUES('config',4);
COMMIT;
This is from the global/db.bin
SELECT * FROM nodes;
1|none||0.0.0.0|14|133|2019-07-18 05:15:38.498156731+00:00|0
SELECT * FROM networks_nodes;
4|4|1
SELECT * FROM networks_config;
46|4||ipv4.address|10.197.134.1/24
47|4||ipv4.firewall|false
48|4||ipv4.nat|true
49|4||ipv6.address|fd42:52a6:f1cf:6e7a::1/64
50|4||ipv6.nat|true
Let me know if there's something else you need / or if I'm even on the right track.
A few questions:
Note that any modification to db.bin* is going to be a no-op, since those files are only dump of the raft state, and get rebuilt.
Also, if you had a row in the raft_nodes
table, it means that this LXD instance was clustered (although the title of the issue says it was not). Perhaps it was clustered but the cluster had only one member (this instance itself)?
I have a copy of all original files just to make that clear. I have just emailed that to you
Regarding your questions:
In database/global/
there presently exists:
-rw-r--r-- 1 root root 452K Aug 6 12:45 db.bin
-rw-r--r-- 1 root root 2.2M Aug 6 12:45 db.bin-wal
-rw-r--r-- 1 root root 28M Aug 6 12:45 logs.db
drwxr-xr-x 4 root root 4 Aug 6 12:45 snapshots
The db.bin Write ahead log doesn't exist on my healthy systems (And I know it's not a good idea to remove a WAL) But I tried it anyway as a test. Restarting LXD normally rebuilt db.bin with the older values. Removing the WAL stopped the db.bin file being rebuilt with older values.
stopped all LXD services and sockets
opened global/db.bin with sqlite3, compared to local lxd reference
Looked at nodes table
ran UPDATE nodes SET address = "0.0.0.0" WHERE id = 1;
commit changes and exit
restart lxd
This is purely me poking around in the dark, but this is something I cross referenced to this check
https://github.com/lxc/lxd/blob/6decb1bfad667f23e7a5534b86df0efc096ca0a1/lxd/db/cluster/open.go#L112
But as you say, chances are that somehow this lxd instance was setup as some sort of strange single member cluster. (I did not setup this cluster originally, and once it broke it's hard to check the configuration without digging through the database).
So to solve this what I would like to accomplish is to uncluster this instance and prevent snap auto refreshes and do them manually only after I've verified against a local reference system.
I've had several lxd instances break with the 3.15 auto refresh (which were easily fixed) https://github.com/lxc/lxd/issues/6016 But this one testing instance this issue is about has been giving one issue after another, yet doesn't seem too functionally different to all the others.
You can't disable clustering the way you've done it. The raft database itself has a record of the nodes, so modifying nodes/raft_nodes in that way, effectively just prevents it from connecting.
Mind sending your pre-modification database tarball to stgraber at ubuntu dot com?
I'll take a quick look at how much of the clustering was configured and what may be needed to have it come back online.
I've just sent one to freeekanayaka, sent to you also now.
Thanks, I responded to your e-mail. I'm having your database load just fine here once I replicated the IP address of your node on my system. I'd recommend you:
/var/snap/lxd/common/lxd.debug
file on your systemsnap list
(revision 11437)ps fauxww
systemctl -a | grep lxd
netstat -lnp | grep 8443
ip -4 addr
If all that looks good, then we can try restarting the whole thing again and see what happens:
This will also enable debug logging just in case things go bad again.
Ok so snap list
Name Version Rev Tracking Publisher Notes
core 16-2.39.3 7270 stable canonical* core
lxd 3.15 11437 stable canonical* -
Nothing odd in process tree
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S 11:47 0:01 [kthreadd]
root 3 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [rcu_gp]
root 4 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [rcu_par_gp]
root 6 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/0:0H-kb]
root 8 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [mm_percpu_wq]
root 9 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/0]
root 10 0.0 0.0 0 0 ? I 11:47 0:21 \_ [rcu_sched]
root 11 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/0]
root 12 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/0]
root 14 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/0]
root 15 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/1]
root 16 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/1]
root 17 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/1]
root 18 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/1]
root 20 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/1:0H-kb]
root 21 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/2]
root 22 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/2]
root 23 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/2]
root 24 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/2]
root 26 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/2:0H-kb]
root 27 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/3]
root 28 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/3]
root 29 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/3]
root 30 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/3]
root 32 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/3:0H-kb]
root 33 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/4]
root 34 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/4]
root 35 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/4]
root 36 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/4]
root 38 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/4:0H-ev]
root 39 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/5]
root 40 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/5]
root 41 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/5]
root 42 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/5]
root 44 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/5:0H-kb]
root 45 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/6]
root 46 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/6]
root 47 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/6]
root 48 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/6]
root 50 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/6:0H-kb]
root 51 0.0 0.0 0 0 ? S 11:47 0:00 \_ [cpuhp/7]
root 52 0.0 0.0 0 0 ? S 11:47 0:00 \_ [idle_inject/7]
root 53 0.0 0.0 0 0 ? S 11:47 0:00 \_ [migration/7]
root 54 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ksoftirqd/7]
root 56 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/7:0H-kb]
root 57 0.0 0.0 0 0 ? S 11:47 0:00 \_ [kdevtmpfs]
root 58 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [netns]
root 59 0.0 0.0 0 0 ? S 11:47 0:00 \_ [rcu_tasks_kthre]
root 60 0.0 0.0 0 0 ? S 11:47 0:00 \_ [kauditd]
root 62 0.0 0.0 0 0 ? I 11:47 0:01 \_ [kworker/2:1-mm_]
root 63 0.0 0.0 0 0 ? S 11:47 0:00 \_ [khungtaskd]
root 64 0.0 0.0 0 0 ? S 11:47 0:00 \_ [oom_reaper]
root 65 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [writeback]
root 66 0.0 0.0 0 0 ? S 11:47 0:00 \_ [kcompactd0]
root 67 0.4 0.0 0 0 ? SN 11:47 1:46 \_ [ksmd]
root 68 0.0 0.0 0 0 ? SN 11:47 0:00 \_ [khugepaged]
root 69 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [crypto]
root 70 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kintegrityd]
root 71 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kblockd]
root 72 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [tpm_dev_wq]
root 73 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [ata_sff]
root 74 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [md]
root 75 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [edac-poller]
root 76 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [devfreq_wq]
root 77 0.0 0.0 0 0 ? I 11:47 0:01 \_ [kworker/4:1-eve]
root 78 0.0 0.0 0 0 ? S 11:47 0:00 \_ [watchdogd]
root 83 0.0 0.0 0 0 ? S 11:47 0:00 \_ [kswapd0]
root 85 0.0 0.0 0 0 ? S 11:47 0:00 \_ [ecryptfs-kthrea]
root 176 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kthrotld]
root 178 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [acpi_thermal_pm]
root 179 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [ipv6_addrconf]
root 190 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kstrp]
root 209 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [charger_manager]
root 279 0.0 0.0 0 0 ? S 11:47 0:00 \_ [scsi_eh_0]
root 280 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [scsi_tmf_0]
root 305 0.0 0.0 0 0 ? S 11:47 0:00 \_ [scsi_eh_1]
root 306 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [scsi_tmf_1]
root 311 0.0 0.0 0 0 ? S 11:47 0:00 \_ [scsi_eh_2]
root 313 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [scsi_tmf_2]
root 317 0.0 0.0 0 0 ? S 11:47 0:00 \_ [scsi_eh_3]
root 320 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [scsi_tmf_3]
root 321 0.0 0.0 0 0 ? S 11:47 0:00 \_ [scsi_eh_4]
root 322 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [scsi_tmf_4]
root 323 0.0 0.0 0 0 ? S 11:47 0:00 \_ [scsi_eh_5]
root 325 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [scsi_tmf_5]
root 332 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/2:1H]
root 333 0.0 0.0 0 0 ? S 11:47 0:00 \_ [i915/signal:0]
root 334 0.0 0.0 0 0 ? S 11:47 0:00 \_ [i915/signal:1]
root 335 0.0 0.0 0 0 ? S 11:47 0:00 \_ [i915/signal:2]
root 336 0.0 0.0 0 0 ? S 11:47 0:00 \_ [i915/signal:6]
root 340 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/4:1H-kb]
root 341 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/7:1H-kb]
root 342 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/6:1H-kb]
root 418 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/3:1H-kb]
root 432 0.0 0.0 0 0 ? I< 11:47 0:00 \_ [kworker/5:1H-kb]
root 449 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kworker/1:1H-kb]
root 464 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kdmflush]
root 466 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kcryptd_io/253:]
root 467 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kcryptd/253:0]
root 468 0.0 0.0 0 0 ? S 11:48 0:09 \_ [dmcrypt_write/2]
root 486 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kdmflush]
root 488 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kcryptd_io/253:]
root 489 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kcryptd/253:1]
root 490 0.0 0.0 0 0 ? S 11:48 0:08 \_ [dmcrypt_write/2]
root 563 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [spl_system_task]
root 564 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [spl_delay_taskq]
root 565 0.0 0.0 0 0 ? S< 11:48 0:01 \_ [spl_dynamic_tas]
root 566 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [spl_kmem_cache]
root 577 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [zvol]
root 583 0.0 0.0 0 0 ? S 11:48 0:00 \_ [arc_prune]
root 584 0.0 0.0 0 0 ? S 11:48 0:00 \_ [arc_reclaim]
root 588 0.0 0.0 0 0 ? S 11:48 0:01 \_ [dbu_evict]
root 589 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dbuf_evict]
root 592 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [z_vdev_file]
root 593 0.0 0.0 0 0 ? S 11:48 0:01 \_ [l2arc_feed]
root 620 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [kworker/0:1H-kb]
root 688 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_null_iss]
root 689 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_null_int]
root 690 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_iss]
root 691 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_0]
root 692 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_1]
root 693 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_2]
root 694 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_3]
root 695 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_4]
root 696 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_5]
root 697 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_6]
root 698 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_rd_int_7]
root 699 0.0 0.0 0 0 ? S< 11:48 0:04 \_ [z_wr_iss]
root 700 0.0 0.0 0 0 ? S< 11:48 0:04 \_ [z_wr_iss]
root 701 0.0 0.0 0 0 ? S< 11:48 0:04 \_ [z_wr_iss]
root 702 0.0 0.0 0 0 ? S< 11:48 0:04 \_ [z_wr_iss]
root 703 0.0 0.0 0 0 ? S< 11:48 0:04 \_ [z_wr_iss]
root 704 0.0 0.0 0 0 ? S< 11:48 0:04 \_ [z_wr_iss]
root 705 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_wr_iss_h]
root 706 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_0]
root 707 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_1]
root 708 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_2]
root 709 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_3]
root 710 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_4]
root 711 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_5]
root 712 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_6]
root 713 0.0 0.0 0 0 ? S< 11:48 0:03 \_ [z_wr_int_7]
root 714 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_wr_int_h]
root 715 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_0]
root 716 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_1]
root 717 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_2]
root 718 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_3]
root 719 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_4]
root 720 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_5]
root 721 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_6]
root 722 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_iss_7]
root 723 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_fr_int]
root 724 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_cl_iss]
root 725 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_cl_int]
root 726 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [z_ioctl_iss]
root 727 0.0 0.0 0 0 ? S< 11:48 0:12 \_ [z_ioctl_int]
root 728 0.0 0.0 0 0 ? S 11:48 0:00 \_ [z_zvol]
root 729 0.0 0.0 0 0 ? S 11:48 0:00 \_ [z_prefetch]
root 730 0.0 0.0 0 0 ? S 11:48 0:00 \_ [z_upgrade]
root 731 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [metaslab_group_]
root 737 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_sync_taskq]
root 738 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_sync_taskq]
root 739 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_sync_taskq]
root 740 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_sync_taskq]
root 741 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_sync_taskq]
root 742 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_sync_taskq]
root 743 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 744 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 745 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 746 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 747 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 748 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 749 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 750 0.0 0.0 0 0 ? SN 11:48 0:00 \_ [dp_zil_clean_ta]
root 751 0.0 0.0 0 0 ? S 11:48 0:00 \_ [z_iput]
root 802 0.0 0.0 0 0 ? S 11:48 0:00 \_ [txg_quiesce]
root 803 0.0 0.0 0 0 ? S 11:48 0:03 \_ [txg_sync]
root 804 0.0 0.0 0 0 ? S 11:48 0:00 \_ [mmp]
root 1353 0.0 0.0 0 0 ? S 11:48 0:00 \_ [irq/126-mei_me]
root 1504 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [loop1]
root 1507 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [loop2]
root 1509 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [loop3]
root 1511 0.0 0.0 0 0 ? S< 11:48 0:00 \_ [loop4]
root 1523 0.0 0.0 0 0 ? S 11:48 0:00 \_ [jbd2/sda1-8]
root 1524 0.0 0.0 0 0 ? I< 11:48 0:00 \_ [ext4-rsv-conver]
root 1591 0.0 0.0 4504 544 ? S 11:48 0:00 \_ bpfilter_umh
root 2790 0.0 0.0 0 0 ? I 11:49 0:01 \_ [kworker/3:2-eve]
root 2966 0.0 0.0 0 0 ? S< 11:49 0:00 \_ [z_rd_iss]
root 3587 0.0 0.0 0 0 ? I< 11:49 0:00 \_ [cfg80211]
root 12768 0.0 0.0 0 0 ? I 13:09 0:00 \_ [kworker/5:1-eve]
root 22733 0.0 0.0 0 0 ? I 13:45 0:00 \_ [kworker/0:1-eve]
root 18519 0.0 0.0 0 0 ? I< 17:15 0:01 \_ [kworker/u17:5-k]
root 23681 0.0 0.0 0 0 ? I 17:20 0:00 \_ [kworker/7:1-eve]
root 25280 0.0 0.0 0 0 ? I 17:20 0:00 \_ [kworker/1:0-eve]
root 11463 0.0 0.0 0 0 ? I 17:35 0:00 \_ [kworker/2:2-rcu]
root 15558 0.0 0.0 0 0 ? I< 17:51 0:00 \_ [kworker/u17:4-k]
root 16864 0.0 0.0 0 0 ? I 17:56 0:00 \_ [kworker/4:2-cgr]
root 21029 0.0 0.0 0 0 ? I< 18:06 0:00 \_ [kworker/u17:6-k]
root 23133 0.0 0.0 0 0 ? I< 18:15 0:00 \_ [kworker/u17:2-k]
root 24948 0.0 0.0 0 0 ? I 18:15 0:00 \_ [kworker/u16:0-e]
root 25787 0.0 0.0 0 0 ? I 18:15 0:00 \_ [kworker/6:2-eve]
root 28189 0.0 0.0 0 0 ? I 18:23 0:00 \_ [kworker/u16:2-e]
root 28192 0.0 0.0 0 0 ? I 18:23 0:00 \_ [kworker/6:3-eve]
root 28193 0.0 0.0 0 0 ? I 18:23 0:00 \_ [kworker/5:2-eve]
root 28195 0.0 0.0 0 0 ? I 18:23 0:00 \_ [kworker/0:3-eve]
root 28212 0.0 0.0 0 0 ? I 18:23 0:00 \_ [kworker/1:1-mm_]
root 28214 0.0 0.0 0 0 ? I 18:23 0:00 \_ [kworker/7:2-rcu]
root 28451 0.0 0.0 0 0 ? I 18:24 0:00 \_ [kworker/3:1-rcu]
root 29478 0.0 0.0 0 0 ? I< 18:24 0:00 \_ [xfsalloc]
root 29482 0.0 0.0 0 0 ? I< 18:24 0:00 \_ [xfs_mru_cache]
root 29490 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsIO]
root 29491 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29492 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29493 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29494 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29495 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29496 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29497 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29498 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsCommit]
root 29499 0.0 0.0 0 0 ? S 18:24 0:00 \_ [jfsSync]
root 32356 0.0 0.0 0 0 ? I 18:33 0:00 \_ [kworker/u16:1-e]
root 32400 0.0 0.0 0 0 ? I< 18:35 0:00 \_ [kworker/u17:0-k]
root 1 0.0 0.0 225760 7244 ? Ss 11:47 0:05 /sbin/init nosplash
root 1102 0.0 0.1 169472 54716 ? S<s 11:48 0:04 /lib/systemd/systemd-journald
root 1119 0.0 0.0 105900 1076 ? Ss 11:48 0:00 /sbin/lvmetad -f
root 1137 0.0 0.0 46620 4016 ? Ss 11:48 0:02 /lib/systemd/systemd-udevd
systemd+ 1571 0.0 0.0 141920 3744 ? Ssl 11:48 0:00 /lib/systemd/systemd-timesyncd
root 1617 0.0 0.0 31972 1236 ? S<sl 11:48 0:00 /sbin/auditd
root 1777 0.0 0.0 167492 14512 ? Ssl 11:48 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 1778 0.0 0.0 434324 6436 ? Ssl 11:48 0:00 /usr/sbin/ModemManager --filter-policy=strict
syslog 1779 0.0 0.0 263032 3240 ? Ssl 11:48 0:02 /usr/sbin/rsyslogd -n
root 1780 0.0 0.0 2032460 22796 ? Ssl 11:48 0:12 /usr/lib/snapd/snapd
root 1781 0.0 0.0 70596 4384 ? Ss 11:48 0:00 /lib/systemd/systemd-logind
root 1782 0.0 0.0 26708 3868 ? Ss 11:48 0:00 /usr/sbin/smartd -n
message+ 1783 0.0 0.0 50372 3284 ? Ss 11:48 0:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 1792 0.0 0.0 45220 3776 ? Ss 11:48 0:00 /sbin/wpa_supplicant -u -s -O /run/wpa_supplicant
root 1793 0.0 0.0 182968 2452 ? Ssl 11:48 0:00 /usr/sbin/zed -F
root 1794 0.0 0.0 176916 6972 ? Ssl 11:48 0:02 /usr/sbin/thermald --no-daemon --dbus-enable
root 1796 0.0 0.0 197024 28656 ? Ssl 11:48 0:00 /usr/bin/python3 -Es /usr/sbin/firewalld --nofork --nopid
root 1797 0.0 0.0 28352 1988 ? Ss 11:48 0:00 /usr/sbin/cron -f
root 1810 0.0 0.0 289416 6052 ? Ssl 11:48 0:00 /usr/lib/policykit-1/polkitd --no-debug
systemd+ 1940 0.0 0.0 80168 5336 ? Ss 11:48 0:00 /lib/systemd/systemd-networkd
root 1942 0.0 0.0 404608 12604 ? Ssl 11:48 0:01 /usr/sbin/NetworkManager --no-daemon
root 2109 0.0 0.0 25988 2644 ? Ss 11:49 0:00 /sbin/dhclient -1 -4 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df /var/lib/dhcp/dhclient6.eth0.leases eth0
root 2779 0.0 0.0 1801668 24424 ? Ssl 11:49 0:01 /usr/sbin/libvirtd
root 2782 0.0 0.0 184328 17216 ? Ssl 11:49 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
root 2784 0.0 0.0 441388 18156 ? Ssl 11:49 0:24 /usr/bin/python3 /usr/bin/fail2ban-server -xf start
sciswk 2787 0.0 0.0 76876 7220 ? Ss 11:49 0:00 /lib/systemd/systemd --user
sciswk 2798 0.0 0.0 112000 2668 ? S 11:49 0:00 \_ (sd-pam)
root 2791 0.0 0.0 14892 1912 tty1 Ss+ 11:49 0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
root 2799 0.0 0.0 72292 4680 ? Ss 11:49 0:00 /usr/sbin/sshd -D
root 7517 0.0 0.0 107980 6136 ? Ss 11:51 0:00 \_ sshd: admin [priv]
admin 7562 0.0 0.0 108264 4164 ? S 11:51 0:03 \_ sshd: admin@pts/0,pts/1,pts/2
admin 7563 0.0 0.0 20304 5172 pts/0 Ss 11:51 0:00 \_ -bash
root 31797 0.0 0.0 60576 3716 pts/0 S 18:27 0:00 | \_ sudo su
root 31798 0.0 0.0 60080 3416 pts/0 S 18:27 0:00 | \_ su
root 31799 0.0 0.0 18636 3452 pts/0 S 18:27 0:00 | \_ bash
root 32653 0.0 0.0 37096 3552 pts/0 R+ 18:36 0:00 | \_ ps fauxww
admin 1093 0.0 0.0 20316 5228 pts/1 Ss+ 12:33 0:00 \_ -bash
admin 18082 0.0 0.0 20412 5380 pts/2 Ss+ 13:18 0:00 \_ -bash
root 3099 0.0 0.0 67372 3584 ? Ss 11:49 0:00 /usr/lib/postfix/sbin/master -w
postfix 3101 0.0 0.0 73852 4504 ? S 11:49 0:00 \_ qmgr -l -t unix -u
postfix 30300 0.0 0.0 73812 4600 ? S 18:26 0:00 \_ pickup -l -t unix -u -c
netdata 3103 0.5 0.1 208268 58492 ? Ssl 11:49 2:25 /usr/sbin/netdata -D
netdata 9530 1.1 0.0 22068 4852 ? S 15:49 1:55 \_ /usr/lib/x86_64-linux-gnu/netdata/plugins.d/apps.plugin 1
netdata 15182 0.1 0.0 18504 3132 ? S 17:50 0:03 \_ bash /usr/lib/x86_64-linux-gnu/netdata/plugins.d/tc-qos-helper.sh 1
libvirt+ 3922 0.0 0.0 49960 1972 ? S 11:49 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
root 3923 0.0 0.0 49932 368 ? S 11:49 0:00 \_ /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
admin 7521 0.0 0.0 76768 7452 ? Ss 11:51 0:00 /lib/systemd/systemd --user
admin 7522 0.0 0.0 259688 2940 ? S 11:51 0:00 \_ (sd-pam)
admin 22937 0.0 0.0 28900 2876 ? Ss 17:20 0:00 SCREEN -R
admin 22938 0.0 0.0 19856 4728 pts/3 Ss+ 17:20 0:00 \_ /bin/bash
systemd+ 11183 0.0 0.0 70752 4980 ? Ss 17:35 0:00 /lib/systemd/systemd-resolved
lxd 31579 0.0 0.0 49980 396 ? S 18:26 0:00 dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --no-ping --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.197.134.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.197.134.2,10.197.134.254,1h --listen-address=fd42:52a6:f1cf:6e7a::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd
lxd systemd related bits
sys-devices-virtual-net-lxdbr0.device loaded active plugged /sys/devices/virtual/net/lxdbr0
sys-subsystem-net-devices-lxdbr0.device loaded active plugged /sys/subsystem/net/devices/lxdbr0
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-11437.mount loaded active mounted Mount unit for lxd, revision 11437
snap-lxd-11494.mount loaded active mounted Mount unit for lxd, revision 11494
snap.lxd.activate.service loaded inactive dead Service for snap application lxd.activate
snap.lxd.daemon.service loaded inactive dead Service for snap application lxd.daemon
snap.lxd.daemon.unix.socket loaded inactive dead Socket unix for snap application lxd.daemon
Nothing using port 8443
Public IP is definetly up and tied to eth0 device else I would not be able to SSH into this system. IP_ADDR everywhere here corresponds to the public IP. Second device br0 is a public bridge with /29 network some containers have interfaces routed through this.
Network wise everything is fine, starting containers via lxd command directly somehow works and everything comes up perfectly, but staring via snap. lxd does not start.
/var/snap/lxd/common/lxd/logs/lxd.log
t=2019-08-06T18:26:36+0200 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2019-08-06T18:26:36+0200 lvl=info msg="Kernel uid/gid map:"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - u 0 0 4294967295"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - g 0 0 4294967295"
t=2019-08-06T18:26:36+0200 lvl=info msg="Configured LXD uid/gid map:"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - u 0 1000000 1000000000"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - g 0 1000000 1000000000"
t=2019-08-06T18:26:36+0200 lvl=info msg="Kernel features:"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - uevent injection: yes"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - seccomp listener: yes"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - shiftfs support: no"
t=2019-08-06T18:26:36+0200 lvl=info msg="Initializing local database"
t=2019-08-06T18:26:36+0200 lvl=info msg="Starting /dev/lxd handler:"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2019-08-06T18:26:36+0200 lvl=info msg="REST API daemon:"
t=2019-08-06T18:26:36+0200 lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-08-06T18:26:36+0200 lvl=info msg=" - binding TCP socket" socket=IP_ADDR:8443
t=2019-08-06T18:26:36+0200 lvl=info msg="Initializing global database"
t=2019-08-06T18:26:36+0200 lvl=info msg="Initializing storage pools"
t=2019-08-06T18:26:36+0200 lvl=info msg="Initializing networks"
t=2019-08-06T18:26:42+0200 lvl=eror msg="Failed to start image leftover cleanup operation" err="failed to add operation 1bcd8387-6144-4798-9e48-16adacf08edf to database: database is locked"
t=2019-08-06T18:26:42+0200 lvl=info msg="Pruning leftover image files"
Specifically: "failed to add operation 1bcd8387-6144-4798-9e48-16adacf08edf to database: database is locked"
Attempting to restart the whole thing results in
sudo lxc info
Error: Get http://unix.socket/1.0: EOF
Also there are several log files touched seemingly simultaneously/after another?
ls -lah /var/snap/lxd/common/lxd/logs/
-rw-r--r-- 1 root root 2.3K Aug 6 18:51 lxd.log
-rw-r--r-- 1 root root 3.8K Aug 6 18:51 lxd.log.1
-rw-r--r-- 1 root root 823 Aug 6 18:51 lxd.log.2.gz
-rw-r--r-- 1 root root 826 Aug 6 18:51 lxd.log.3.gz
-rw-r--r-- 1 root root 820 Aug 6 18:51 lxd.log.4.gz
-rw-r--r-- 1 root root 825 Aug 6 18:51 lxd.log.5.gz
-rw-r--r-- 1 root root 818 Aug 6 18:50 lxd.log.6.gz
-rw-r--r-- 1 root root 822 Aug 6 18:50 lxd.log.7.gz
cat /var/snap/lxd/common/lxd/logs/lxd.log
t=2019-08-06T18:48:44+0200 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2019-08-06T18:48:44+0200 lvl=info msg="Kernel uid/gid map:"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - u 0 0 4294967295"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - g 0 0 4294967295"
t=2019-08-06T18:48:44+0200 lvl=info msg="Configured LXD uid/gid map:"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - u 0 1000000 1000000000"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - g 0 1000000 1000000000"
t=2019-08-06T18:48:44+0200 lvl=info msg="Kernel features:"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - uevent injection: yes"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - seccomp listener: yes"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - shiftfs support: no"
t=2019-08-06T18:48:44+0200 lvl=info msg="Initializing local database"
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Initializing database gateway"
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Start database node" address=IP_ADDR:8443 id=1
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Detected stale unix socket, deleting"
t=2019-08-06T18:48:44+0200 lvl=info msg="Starting /dev/lxd handler:"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2019-08-06T18:48:44+0200 lvl=info msg="REST API daemon:"
t=2019-08-06T18:48:44+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-08-06T18:48:44+0200 lvl=info msg=" - binding TCP socket" socket=IP_ADDR:8443
t=2019-08-06T18:48:44+0200 lvl=info msg="Initializing global database"
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Found cert" name=0
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Found cert" name=0
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Dqlite: connected address=IP_ADDR:8443 attempt=0"
t=2019-08-06T18:48:44+0200 lvl=info msg="Initializing storage pools"
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Initializing and checking storage pool \"local\""
t=2019-08-06T18:48:44+0200 lvl=dbug msg="Checking ZFS storage pool \"local\""
t=2019-08-06T18:48:44+0200 lvl=info msg="Initializing networks"
t=2019-08-06T18:48:49+0200 lvl=dbug msg="Dqlite: closing client"
t=2019-08-06T18:48:49+0200 lvl=dbug msg="Database error: &errors.errorString{s:\"driver: bad connection\"}"
t=2019-08-06T18:48:49+0200 lvl=dbug msg="Retry failed db interaction (driver: bad connection)"
t=2019-08-06T18:48:50+0200 lvl=dbug msg="Found cert" name=0
cat /var/snap/lxd/common/lxd/logs/lxd.log.1
t=2019-08-06T18:48:51+0200 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2019-08-06T18:48:51+0200 lvl=info msg="Kernel uid/gid map:"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - u 0 0 4294967295"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - g 0 0 4294967295"
t=2019-08-06T18:48:51+0200 lvl=info msg="Configured LXD uid/gid map:"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - u 0 1000000 1000000000"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - g 0 1000000 1000000000"
t=2019-08-06T18:48:51+0200 lvl=info msg="Kernel features:"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - uevent injection: yes"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - seccomp listener: yes"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - shiftfs support: no"
t=2019-08-06T18:48:51+0200 lvl=info msg="Initializing local database"
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Initializing database gateway"
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Start database node" address=IP_ADDR:8443 id=1
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Detected stale unix socket, deleting"
t=2019-08-06T18:48:51+0200 lvl=info msg="Starting /dev/lxd handler:"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2019-08-06T18:48:51+0200 lvl=info msg="REST API daemon:"
t=2019-08-06T18:48:51+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-08-06T18:48:51+0200 lvl=info msg=" - binding TCP socket" socket=IP_ADDR:8443
t=2019-08-06T18:48:51+0200 lvl=info msg="Initializing global database"
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Found cert" name=0
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Found cert" name=0
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Dqlite: connected address=IP_ADDR:8443 attempt=0"
t=2019-08-06T18:48:51+0200 lvl=info msg="Initializing storage pools"
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Initializing and checking storage pool \"local\""
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Checking ZFS storage pool \"local\""
t=2019-08-06T18:48:51+0200 lvl=info msg="Initializing networks"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Dqlite: closing client"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Database error: &errors.errorString{s:\"driver: bad connection\"}"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Retry failed db interaction (driver: bad connection)"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Found cert" name=0
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Found cert" name=0
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Dqlite: connected address=IP_ADDR:8443 attempt=0"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Database error: bindings.Error{Code:5, Message:\"database is locked\"}"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Retry failed db interaction (database is locked)"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Database error: bindings.Error{Code:5, Message:\"database is locked\"}"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Retry failed db interaction (database is locked)"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Database error: bindings.Error{Code:5, Message:\"database is locked\"}"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Retry failed db interaction (database is locked)"
t=2019-08-06T18:48:58+0200 lvl=dbug msg="Database error: bindings.Error{Code:5, Message:\"database is locked\"}"
t=2019-08-06T18:48:58+0200 lvl=dbug msg="Retry failed db interaction (database is locked)"
t=2019-08-06T18:48:58+0200 lvl=eror msg="Failed to start image leftover cleanup operation" err="failed to add operation 3560bf07-4a8b-43ba-8a16-566b5908e75a to database: database is locked"
t=2019-08-06T18:48:58+0200 lvl=info msg="Pruning leftover image files"
Following with
sudo journalctl -u snap.lxd.daemon.service -b 0 --follow
I can just see lxd service restarting endlessly with the same output
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: => Preparing the system
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Loading snap configuration
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Setting up mntns symlink (mnt:[4026532304])
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Setting up kmod wrapper
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Preparing /boot
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Preparing a clean copy of /run
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Preparing a clean copy of /etc
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Setting up ceph configuration
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Setting up LVM configuration
Aug 06 18:55:33 hetz-hel1 lxd.daemon[17678]: ==> Rotating logs
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: ==> Setting up ZFS (0.7)
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: ==> Escaping the systemd cgroups
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: ====> Detected cgroup V1
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: ==> Escaping the systemd process resource limits
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: ==> Enabling CRIU
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: ==> Disabling shiftfs on this kernel (auto)
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: => Re-using existing LXCFS
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: => Starting LXD
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Kernel uid/gid map:"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - u 0 0 4294967295"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - g 0 0 4294967295"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Configured LXD uid/gid map:"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - u 0 1000000 1000000000"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - g 0 1000000 1000000000"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Kernel features:"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - uevent injection: yes"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - seccomp listener: yes"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - unprivileged file capabilities: yes"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - shiftfs support: no"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Initializing local database"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Initializing database gateway"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Start database node" address=IP_ADDR:8443 id=1
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Detected stale unix socket, deleting"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Starting /dev/lxd handler:"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="REST API daemon:"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg=" - binding TCP socket" socket=IP_ADDR:8443
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Initializing global database"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Found cert" name=0
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Found cert" name=0
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Dqlite: connected address=IP_ADDR:8443 attempt=0"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Initializing storage pools"
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Initializing and checking storage pool \"local\""
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=dbug msg="Checking ZFS storage pool \"local\""
Aug 06 18:55:34 hetz-hel1 lxd.daemon[17678]: t=2019-08-06T18:55:34+0200 lvl=info msg="Initializing networks"
Aug 06 18:55:41 hetz-hel1 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Aug 06 18:55:41 hetz-hel1 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Aug 06 18:55:41 hetz-hel1 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Aug 06 18:55:41 hetz-hel1 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 73.
Aug 06 18:55:41 hetz-hel1 systemd[1]: Stopped Service for snap application lxd.daemon.
Aug 06 18:55:41 hetz-hel1 systemd[1]: Started Service for snap application lxd.daemon.
That's really odd, I'm failing to think of something that would explain it behaving when started manually vs from the systemd unit...
Can you show systemctl cat snap.lxd.daemon.service
.
# /etc/systemd/system/snap.lxd.daemon.service
[Unit]
# Auto-generated, DO NOT EDIT
Description=Service for snap application lxd.daemon
Requires=snap-lxd-11437.mount
Wants=network.target
After=snap-lxd-11437.mount network.target
X-Snappy=yes
[Service]
ExecStart=/usr/bin/snap run lxd.daemon
SyslogIdentifier=lxd.daemon
Restart=on-failure
WorkingDirectory=/var/snap/lxd/11437
ExecStop=/usr/bin/snap run --command=stop lxd.daemon
ExecReload=/usr/bin/snap run --command=reload lxd.daemon
TimeoutStopSec=600
Type=simple
Verified to be matching a working install.
@rigred did things fail again ever since they decided to start when I was playing with the daemon?
It's been working perfectly ever since. Though I can sometimes make it fail by running a daemon reload in successive order.
I suspect it's something timing related that's happening.
On Mon, Aug 12, 2019 at 4:51 AM Stéphane Graber notifications@github.com wrote:
@rigred https://github.com/rigred did things fail again ever since they decided to start when I was playing with the daemon?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/6036?email_source=notifications&email_token=ABLTS3C6SPIZSEGXTVE573LQEDF2NA5CNFSM4IIS3UDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4BPCUQ#issuecomment-520286546, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLTS3GR5L2IQXLOPCBVVLLQEDF2NANCNFSM4IIS3UDA .
@freeekanayaka does database locked
type error happening when restarting the daemon very quickly make sense to you?
I do remember seeing this behavior locally a couple of times when quickly restarting the daemon for some tests and systemd's restart policy certainly wouldn't help us there, but I've failed to reproduce this behavior locally now.
@stgraber, no restarting the daemon quickly shouldn't have any particular effect in that regard. It feels like some kind of bad cascading error:
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Dqlite: connected address=IP_ADDR:8443 attempt=0"
t=2019-08-06T18:48:51+0200 lvl=info msg="Initializing storage pools"
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Initializing and checking storage pool \"local\""
t=2019-08-06T18:48:51+0200 lvl=dbug msg="Checking ZFS storage pool \"local\""
t=2019-08-06T18:48:51+0200 lvl=info msg="Initializing networks"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Dqlite: closing client"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Database error: &errors.errorString{s:\"driver: bad connection\"}"
t=2019-08-06T18:48:57+0200 lvl=dbug msg="Retry failed db interaction (driver: bad connection)"
It feels that leadership is not stable for some reason, and causes some cascading error which we might not handle too well.
@rigred how are things going for you lately? did this issue ever reappear?
Missed your comment in the clutter of my other notifications. The issue hasn't reoccurred since, which both nice yet also concerning, since I have no idea when/how/why it might reoccur.
Ok, for now we'll assume that this was one of the many issues that we've sorted out with changes to dqlite around the dqlite 1.0 release. If it does happen again for you or anyone else, please comment here and we'll re-open.
It has re-occurred for the same machine in the same/similar manner. Doing a reload fails all the time. It did restart on it's own once. But since then absolutely nothing can bring it back up.
Also what's the explanation of "use of closed network connection", it doesn't make sense to me as the connection it's referring to is available.
Btw this system isn't a prod server, just my own box I use for my own DIY projects.
Oct 01 21:27:47 hetz-hel1 lxd.daemon[18671]: => Preparing the system
Oct 01 21:27:47 hetz-hel1 lxd.daemon[18671]: ==> Loading snap configuration
Oct 01 21:27:47 hetz-hel1 lxd.daemon[18671]: ==> Setting up mntns symlink (mnt:[4026532357])
Oct 01 21:27:55 hetz-hel1 lxd.daemon[18671]: ==> Setting up kmod wrapper
Oct 01 21:27:55 hetz-hel1 lxd.daemon[18671]: ==> Preparing /boot
Oct 01 21:28:04 hetz-hel1 lxd.daemon[18671]: ==> Preparing a clean copy of /run
Oct 01 21:28:13 hetz-hel1 lxd.daemon[18671]: ==> Preparing a clean copy of /etc
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Setting up ceph configuration
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Setting up LVM configuration
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Rotating logs
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Setting up ZFS (0.7)
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Escaping the systemd cgroups
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ====> Detected cgroup V1
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Escaping the systemd process resource limits
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Enabling CRIU
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: ==> Disabling shiftfs on this kernel (auto)
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: => Re-using existing LXCFS
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: => Starting LXD
Oct 01 21:28:58 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:28:58+0200 lvl=warn msg="Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored."
Oct 01 21:29:12 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:12+0200 lvl=warn msg="Dqlite client proxy TLS -> Unix: read tcp PUBLIC_IP:48290->PUBLIC_IP:8443: use of closed network connection"
Oct 01 21:29:12 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:12+0200 lvl=warn msg="Dqlite server proxy Unix -> TLS: read unix @->@17a17: use of closed network connection"
Oct 01 21:29:17 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:17+0200 lvl=warn msg="Dqlite client proxy TLS -> Unix: read tcp PUBLIC_IP:48294->PUBLIC_IP:8443: use of closed network connection"
Oct 01 21:29:17 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:17+0200 lvl=warn msg="Dqlite server proxy Unix -> TLS: read unix @->@17a17: use of closed network connection"
Oct 01 21:29:22 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:22+0200 lvl=eror msg="Failed to start the daemon: failed to open cluster database: Failed to set page cache size: driver: bad connection"
Oct 01 21:29:22 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:22+0200 lvl=warn msg="Dqlite client proxy TLS -> Unix: read tcp PUBLIC_IP:48298->PUBLIC_IP:8443: use of closed network connection"
Oct 01 21:29:22 hetz-hel1 lxd.daemon[18671]: t=2019-10-01T21:29:22+0200 lvl=warn msg="Dqlite server proxy Unix -> TLS: read unix @->@17a17: use of closed network connection"
Oct 01 21:29:22 hetz-hel1 lxd.daemon[18671]: Error: failed to open cluster database: Failed to set page cache size: driver: bad connection
Oct 01 21:29:23 hetz-hel1 lxd.daemon[18671]: => LXD failed to start
root@hetz-hel1:~# lxd --group lxd --debug
INFO[10-01|21:34:54] LXD 3.17 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[10-01|21:34:54] Kernel uid/gid map:
INFO[10-01|21:34:54] - u 0 0 4294967295
INFO[10-01|21:34:54] - g 0 0 4294967295
INFO[10-01|21:34:54] Configured LXD uid/gid map:
INFO[10-01|21:34:54] - u 0 1000000 1000000000
INFO[10-01|21:34:54] - g 0 1000000 1000000000
WARN[10-01|21:34:54] Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored.
INFO[10-01|21:34:54] Kernel features:
INFO[10-01|21:34:54] - netnsid-based network retrieval: yes
INFO[10-01|21:34:54] - uevent injection: yes
INFO[10-01|21:34:54] - seccomp listener: yes
INFO[10-01|21:34:54] - unprivileged file capabilities: yes
INFO[10-01|21:34:54] - shiftfs support: yes
INFO[10-01|21:34:54] Initializing local database
DBUG[10-01|21:34:54] Initializing database gateway
DBUG[10-01|21:34:54] Start database node id=1 address=PUBLIC_IP:8443
INFO[10-01|21:35:03] Starting /dev/lxd handler:
INFO[10-01|21:35:03] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-01|21:35:03] REST API daemon:
INFO[10-01|21:35:03] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-01|21:35:03] - binding TCP socket socket=PUBLIC_IP:8443
INFO[10-01|21:35:03] Initializing global database
DBUG[10-01|21:35:03] Found cert name=0
DBUG[10-01|21:35:03] Found cert name=0
DBUG[10-01|21:35:03] Dqlite: connected address=PUBLIC_IP:8443 attempt=0
WARN[10-01|21:35:08] Dqlite client proxy TLS -> Unix: read tcp PUBLIC_IP:48314->PUBLIC_IP:8443: use of closed network connection
WARN[10-01|21:35:08] Dqlite server proxy Unix -> TLS: read unix @->@17a1f: use of closed network connection
DBUG[10-01|21:35:08] Found cert name=0
DBUG[10-01|21:35:08] Found cert name=0
DBUG[10-01|21:35:08] Dqlite: connected address=PUBLIC_IP:8443 attempt=0
WARN[10-01|21:35:13] Dqlite client proxy TLS -> Unix: read tcp PUBLIC_IP:48318->PUBLIC_IP:8443: use of closed network connection
WARN[10-01|21:35:13] Dqlite server proxy Unix -> TLS: read unix @->@17a1f: use of closed network connection
DBUG[10-01|21:35:13] Found cert name=0
DBUG[10-01|21:35:13] Found cert name=0
DBUG[10-01|21:35:13] Dqlite: connected address=PUBLIC_IP:8443 attempt=0
EROR[10-01|21:35:18] Failed to start the daemon: failed to open cluster database: Failed to set page cache size: driver: bad connection
INFO[10-01|21:35:18] Starting shutdown sequence
INFO[10-01|21:35:18] Stopping REST API handler:
INFO[10-01|21:35:18] - closing socket socket=PUBLIC_IP:8443
WARN[10-01|21:35:18] Dqlite client proxy TLS -> Unix: read tcp PUBLIC_IP:48322->PUBLIC_IP:8443: use of closed network connection
INFO[10-01|21:35:18] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
WARN[10-01|21:35:18] Dqlite server proxy Unix -> TLS: read unix @->@17a1f: use of closed network connection
INFO[10-01|21:35:18] Stopping /dev/lxd handler:
INFO[10-01|21:35:18] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
DBUG[10-01|21:35:18] Stop database gateway
DBUG[10-01|21:35:18] Not unmounting temporary filesystems (containers are still running)
Error: failed to open cluster database: Failed to set page cache size: driver: bad connection
@freeekanayaka ideas?
I would normally suspect an IP/PORT conflict or firewalling preventing the connection, but given that this is the local system, I'm not sure how that could be.
Issue description
LXD instance wont start from systemd service or from snap, gives below log info. Only starting manually with
lxd --group lxd
works.Steps to reproduce
snap restart/start lxd
Notes
lxd --group lxd
works as normal./var/snap/lxd/common/lxd/database