Closed truelai closed 5 years ago
Just now noticed that RAID firmware was out of date. Tainted kernel gone from DMESG.
Does that self-resolve or do you need to reload LXD for it to go away?
Fixed the kernel taint. Still having the same issues. A reload remedies it only temporarily.
Just failed to start container as well.
lai@R610-LXD2-LAN:~$ lxc start ldap
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart ldap /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/ldap/lxc.conf:
Try lxc info --show-log ldap
for more info
lai@R610-LXD2-LAN:~$ lxc info --show-log ldap
Name: ldap
Location: none
Remote: unix://
Architecture: x86_64
Created: 2018/03/07 20:29 UTC
Status: Stopped
Type: persistent
Profiles: default
Snapshots:
ldap-Mar-8-2018 (taken at 2018/03/08 18:24 UTC) (stateless)
ldap-20180524 (taken at 2018/05/24 20:10 UTC) (stateless)
ldap-after-migration-20180524 (taken at 2018/05/24 20:46 UTC) (stateless)
ldap-20180529 (taken at 2018/05/29 15:14 UTC) (stateless)
ldap-20180620 (taken at 2018/06/20 16:43 UTC) (stateless)
ldap-20180622 (taken at 2018/06/22 16:06 UTC) (stateless)
ldap-deleted-extra-domains-20180622 (taken at 2018/06/22 19:31 UTC) (stateless)
ldap-20180629 (taken at 2018/06/29 18:51 UTC) (stateless)
ldap-20180703 (taken at 2018/07/03 19:37 UTC) (stateless)
ldap-20180703-2 (taken at 2018/07/03 19:58 UTC) (stateless)
ldap-FIXED-20180705 (taken at 2018/07/05 14:52 UTC) (stateless)
ldap20180730 (taken at 2018/07/30 14:14 UTC) (stateless)
ldap-2018-08-01 (taken at 2018/08/01 14:20 UTC) (stateless)
ldap-20180801 (taken at 2018/08/01 14:20 UTC) (stateless)
ldap-20180801-2 (taken at 2018/08/01 15:14 UTC) (stateless)
ldap-20180803 (taken at 2018/08/03 13:27 UTC) (stateless)
ldap-20180803-configupdated (taken at 2018/08/03 14:35 UTC) (stateless)
ldap-201810-04 (taken at 2018/10/04 20:49 UTC) (stateless)
ldap-20181004 (taken at 2018/10/04 21:03 UTC) (stateless)
20190325 (taken at 2019/03/25 22:55 UTC) (stateless)
Log:
lxc ldap 20190822231327.621 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1 lxc ldap 20190822231327.621 ERROR start - start.c:lxc_init:887 - Failed to run lxc.hook.pre-start for container "ldap" lxc ldap 20190822231327.621 ERROR start - start.c:__lxc_start:1991 - Failed to initialize container "ldap" lxc ldap 20190822231340.216 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:864 - No such file or directory - Failed to receive the container state lxc 20190822231340.216 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command "get_state"
That part is unrelated to the database issue, though could be related to storage issues.
What's in your lxc.log
?
lxc 20190822224359.160 ERROR commands_utils - commands_utils.c:lxc_cmd_sock_rcv_state:72 - Resource temporarily unavailable - Failed to receive message
Since stopping one of our HomeAssistant containers, I've not having the same issues. Here are its logs. Had it running with SQLite
lxc 20190822225845.947 ERROR commands_utils - commands_utils.c:lxc_cmd_sock_rcv_state:72 - Resource temporarily unavailable - Failed to receive message lxc ha7200bc 20190822230049.397 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1 lxc ha7200bc 20190822230049.397 ERROR start - start.c:lxc_fini:1024 - Failed to run "lxc.hook.stop" hook lxc ha7200bc 20190822230050.102 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1 lxc ha7200bc 20190822230050.102 ERROR start - start.c:lxc_fini:1064 - Failed to run lxc.hook.post-stop for container "ha7200bc"
Sorry, I meant lxd.log
which you can find at /var/snap/lxd/common/lxd/logs/lxd.log
t=2019-08-22T19:01:22-0400 lvl=info msg="LXD 3.16 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2019-08-22T19:01:22-0400 lvl=info msg="Kernel uid/gid map:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - u 0 0 4294967295" t=2019-08-22T19:01:22-0400 lvl=info msg=" - g 0 0 4294967295" t=2019-08-22T19:01:22-0400 lvl=info msg="Configured LXD uid/gid map:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - u 0 1000000 1000000000" t=2019-08-22T19:01:22-0400 lvl=info msg=" - g 0 1000000 1000000000" t=2019-08-22T19:01:22-0400 lvl=info msg="Kernel features:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - netnsid-based network retrieval: no" t=2019-08-22T19:01:22-0400 lvl=info msg=" - uevent injection: no" t=2019-08-22T19:01:22-0400 lvl=info msg=" - seccomp listener: no" t=2019-08-22T19:01:22-0400 lvl=info msg=" - unprivileged file capabilities: yes" t=2019-08-22T19:01:22-0400 lvl=info msg=" - shiftfs support: no" t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing local database" t=2019-08-22T19:01:22-0400 lvl=info msg="Starting /dev/lxd handler:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2019-08-22T19:01:22-0400 lvl=info msg="REST API daemon:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket t=2019-08-22T19:01:22-0400 lvl=info msg=" - binding TCP socket" socket=[::]:8443 t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing global database" t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing storage pools" t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing networks" t=2019-08-22T19:01:22-0400 lvl=info msg="Pruning leftover image files" t=2019-08-22T19:01:22-0400 lvl=info msg="Done pruning leftover image files" t=2019-08-22T19:01:22-0400 lvl=info msg="Loading daemon configuration" t=2019-08-22T19:01:22-0400 lvl=info msg="Pruning expired images" t=2019-08-22T19:01:22-0400 lvl=info msg="Done pruning expired images" t=2019-08-22T19:01:22-0400 lvl=info msg="Pruning expired container backups" t=2019-08-22T19:01:22-0400 lvl=info msg="Done pruning expired container backups" t=2019-08-22T19:01:22-0400 lvl=info msg="Updating instance types" t=2019-08-22T19:01:22-0400 lvl=info msg="Expiring log files" t=2019-08-22T19:01:22-0400 lvl=info msg="Done updating instance types" t=2019-08-22T19:01:22-0400 lvl=info msg="Done expiring log files" t=2019-08-22T19:01:22-0400 lvl=info msg="Updating images" t=2019-08-22T19:01:22-0400 lvl=info msg="Done updating images" t=2019-08-22T19:01:22-0400 lvl=eror msg="Error getting source image" err="No such object" fp=47f7d499e98bacbe8a9dae35a8ffa28c9f336624082bdc9e95e322e22d3a16c1 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-06-12T14:20:47-0400 ephemeral=false name=node-red project=default stateful=false used=2019-08-22T18:56:38-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-03-26T09:53:12-0400 ephemeral=false name=mqtt-postgres project=default stateful=false used=2019-08-22T18:49:06-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2017-11-29T17:51:18-0500 ephemeral=false name=cups project=default stateful=false used=2019-08-22T18:48:58-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-03-26T09:55:56-0400 ephemeral=false name=mosquitto-server project=default stateful=false used=2019-08-22T18:49:05-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2018-09-26T15:09:57-0400 ephemeral=false name=haproxy-AD project=default stateful=false used=2019-08-22T17:50:13-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-05-15T13:17:59-0400 ephemeral=false name=gitlab project=default stateful=false used=2019-08-22T18:49:00-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2018-09-27T15:33:26-0400 ephemeral=false name=ha7200bc project=default stateful=false used=2019-08-22T18:49:02-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-05-10T13:41:20-0400 ephemeral=false name=unifi project=default stateful=false used=2019-08-22T18:49:17-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-05-29T13:14:47-0400 ephemeral=false name=bind9 project=default stateful=false used=2019-08-22T18:48:57-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-06-12T14:20:47-0400 ephemeral=false name=node-red project=default stateful=false used=2019-08-22T18:56:38-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2017-11-29T17:51:18-0500 ephemeral=false name=cups project=default stateful=false used=2019-08-22T18:48:58-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-03-26T09:53:12-0400 ephemeral=false name=mqtt-postgres project=default stateful=false used=2019-08-22T18:49:06-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-05-10T13:41:20-0400 ephemeral=false name=unifi project=default stateful=false used=2019-08-22T18:49:17-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-03-26T09:55:56-0400 ephemeral=false name=mosquitto-server project=default stateful=false used=2019-08-22T18:49:05-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-05-15T13:17:59-0400 ephemeral=false name=gitlab project=default stateful=false used=2019-08-22T18:49:00-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-05-29T13:14:47-0400 ephemeral=false name=bind9 project=default stateful=false used=2019-08-22T18:48:57-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2018-09-27T15:33:26-0400 ephemeral=false name=ha7200bc project=default stateful=false used=2019-08-22T18:49:02-0400 t=2019-08-22T19:03:00-0400 lvl=info msg="Started container" action=start created=2018-09-26T15:09:57-0400 ephemeral=false name=haproxy-AD project=default stateful=false used=2019-08-22T17:50:13-0400 t=2019-08-22T19:04:36-0400 lvl=info msg="Starting container" action=start created=2019-06-12T14:15:10-0400 ephemeral=false name=thingsboard project=default stateful=false used=2019-08-22T18:49:12-0400 t=2019-08-22T19:04:37-0400 lvl=info msg="Started container" action=start created=2019-06-12T14:15:10-0400 ephemeral=false name=thingsboard project=default stateful=false used=2019-08-22T18:49:12-0400 t=2019-08-22T19:05:43-0400 lvl=warn msg="Failed to delete operation 018b06a8-e287-4b74-aae8-5fc33d910f53: database is locked" t=2019-08-22T19:12:57-0400 lvl=info msg="Starting container" action=start created=2018-03-07T15:29:20-0500 ephemeral=false name=ldap project=default stateful=false used=2019-08-22T18:49:03-0400 t=2019-08-22T19:13:40-0400 lvl=eror msg="Failed starting container" action=start created=2018-03-07T15:29:20-0500 ephemeral=false name=ldap project=default stateful=false used=2019-08-22T18:49:03-0400 t=2019-08-22T19:15:55-0400 lvl=info msg="Received 'terminated signal', exiting" t=2019-08-22T19:15:55-0400 lvl=info msg="Starting shutdown sequence" t=2019-08-22T19:15:55-0400 lvl=info msg="Stopping REST API handler:" t=2019-08-22T19:15:55-0400 lvl=info msg=" - closing socket" socket=[::]:8443 t=2019-08-22T19:15:55-0400 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2019-08-22T19:15:55-0400 lvl=info msg="Stopping /dev/lxd handler:" t=2019-08-22T19:15:55-0400 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2019-08-22T19:15:55-0400 lvl=info msg="Closing the database"
I removed the source image it was complaining about.
Also, I was able to prevent the locks by turning off one container running HomeAssistant. Any idea why that would be?
I'm not seeing anything odd in there.
You could try doing lxc config set NAME raw.lxc lxc.log.level=trace
on the container and then attempt starting it again which will produce much more output in lxc info NAME --show-log
.
One potential issue is you running out of resources of some kind on your system, be that memory, processes, files, ... You may want to look at dmesg
, make sure you have enough memory (free -m
) and aren't running out of disk space (df -h
) or inodes (df -i
).
You could also try switching to the HWE kernel which would get you onto the same 4.15 kernel as Ubuntu 18.04, especially if anything odd comes up from looking at dmesg.
@truelai how are things going?
@stgraber - I upgraded to 18.04 and looks like the kernel had some bug fixes for btrfs. Since then, the issue has not repeated so it may have been related. I'd call this closed for now. Thanks for checking in!
Also, if you're in Montreal, please shoot me an email. I owe you food and drink.
Required information
The output of "lxc info" or if that fails: config: core.https_address: '[::]:8443' core.trust_password: true api_extensions:
-----END CERTIFICATE----- certificate_fingerprint: driver: lxc driver_version: 3.2.1 kernel: Linux kernel_architecture: x86_64 kernel_features: netnsid_getifaddrs: "false" seccomp_listener: "false" shiftfs: "false" uevent_injection: "false" unpriv_fscaps: "true" kernel_version: 4.4.0-159-generic lxc_features: mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" seccomp_notify: "true" project: default server: lxd server_clustered: false server_name: R610-LXD2-LAN server_pid: 24789 server_version: "3.16" storage: btrfs storage_version: "4.4"
Issue description
This host is having issues I'm not seeing on my other hosts yet. I get the following error: Error: failed to add operation 4216afdb-e988-4175-bbaf-93bc4aaace1a to database: database is locked
I'm often experiencing lots of lag on the host as well. \
Looking at DMESG, I'm seeing a tainted kernel with G I. Is this BTRFS related?
syslog-error-fail.txt
dmesg.txt