canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

database is locked repeatedly #6115

Closed truelai closed 5 years ago

truelai commented 5 years ago

Required information

Issue description

This host is having issues I'm not seeing on my other hosts yet. I get the following error: Error: failed to add operation 4216afdb-e988-4175-bbaf-93bc4aaace1a to database: database is locked

I'm often experiencing lots of lag on the host as well. \

Looking at DMESG, I'm seeing a tainted kernel with G I. Is this BTRFS related?

syslog-error-fail.txt

dmesg.txt

truelai commented 5 years ago

Just now noticed that RAID firmware was out of date. Tainted kernel gone from DMESG.

stgraber commented 5 years ago

Does that self-resolve or do you need to reload LXD for it to go away?

truelai commented 5 years ago

Fixed the kernel taint. Still having the same issues. A reload remedies it only temporarily.

truelai commented 5 years ago

Just failed to start container as well.

lai@R610-LXD2-LAN:~$ lxc start ldap Error: Failed to run: /snap/lxd/current/bin/lxd forkstart ldap /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/ldap/lxc.conf: Try lxc info --show-log ldap for more info lai@R610-LXD2-LAN:~$ lxc info --show-log ldap Name: ldap Location: none Remote: unix:// Architecture: x86_64 Created: 2018/03/07 20:29 UTC Status: Stopped Type: persistent Profiles: default Snapshots: ldap-Mar-8-2018 (taken at 2018/03/08 18:24 UTC) (stateless) ldap-20180524 (taken at 2018/05/24 20:10 UTC) (stateless) ldap-after-migration-20180524 (taken at 2018/05/24 20:46 UTC) (stateless) ldap-20180529 (taken at 2018/05/29 15:14 UTC) (stateless) ldap-20180620 (taken at 2018/06/20 16:43 UTC) (stateless) ldap-20180622 (taken at 2018/06/22 16:06 UTC) (stateless) ldap-deleted-extra-domains-20180622 (taken at 2018/06/22 19:31 UTC) (stateless) ldap-20180629 (taken at 2018/06/29 18:51 UTC) (stateless) ldap-20180703 (taken at 2018/07/03 19:37 UTC) (stateless) ldap-20180703-2 (taken at 2018/07/03 19:58 UTC) (stateless) ldap-FIXED-20180705 (taken at 2018/07/05 14:52 UTC) (stateless) ldap20180730 (taken at 2018/07/30 14:14 UTC) (stateless) ldap-2018-08-01 (taken at 2018/08/01 14:20 UTC) (stateless) ldap-20180801 (taken at 2018/08/01 14:20 UTC) (stateless) ldap-20180801-2 (taken at 2018/08/01 15:14 UTC) (stateless) ldap-20180803 (taken at 2018/08/03 13:27 UTC) (stateless) ldap-20180803-configupdated (taken at 2018/08/03 14:35 UTC) (stateless) ldap-201810-04 (taken at 2018/10/04 20:49 UTC) (stateless) ldap-20181004 (taken at 2018/10/04 21:03 UTC) (stateless) 20190325 (taken at 2019/03/25 22:55 UTC) (stateless)

truelai commented 5 years ago

Log:

lxc ldap 20190822231327.621 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1 lxc ldap 20190822231327.621 ERROR start - start.c:lxc_init:887 - Failed to run lxc.hook.pre-start for container "ldap" lxc ldap 20190822231327.621 ERROR start - start.c:__lxc_start:1991 - Failed to initialize container "ldap" lxc ldap 20190822231340.216 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:864 - No such file or directory - Failed to receive the container state lxc 20190822231340.216 WARN commands - commands.c:lxc_cmd_rsp_recv:135 - Connection reset by peer - Failed to receive response for command "get_state"

stgraber commented 5 years ago

That part is unrelated to the database issue, though could be related to storage issues. What's in your lxc.log?

truelai commented 5 years ago

lxc 20190822224359.160 ERROR commands_utils - commands_utils.c:lxc_cmd_sock_rcv_state:72 - Resource temporarily unavailable - Failed to receive message

truelai commented 5 years ago

Since stopping one of our HomeAssistant containers, I've not having the same issues. Here are its logs. Had it running with SQLite

lxc 20190822225845.947 ERROR commands_utils - commands_utils.c:lxc_cmd_sock_rcv_state:72 - Resource temporarily unavailable - Failed to receive message lxc ha7200bc 20190822230049.397 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1 lxc ha7200bc 20190822230049.397 ERROR start - start.c:lxc_fini:1024 - Failed to run "lxc.hook.stop" hook lxc ha7200bc 20190822230050.102 ERROR conf - conf.c:run_buffer:352 - Script exited with status 1 lxc ha7200bc 20190822230050.102 ERROR start - start.c:lxc_fini:1064 - Failed to run lxc.hook.post-stop for container "ha7200bc"

stgraber commented 5 years ago

Sorry, I meant lxd.log which you can find at /var/snap/lxd/common/lxd/logs/lxd.log

truelai commented 5 years ago

t=2019-08-22T19:01:22-0400 lvl=info msg="LXD 3.16 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2019-08-22T19:01:22-0400 lvl=info msg="Kernel uid/gid map:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - u 0 0 4294967295" t=2019-08-22T19:01:22-0400 lvl=info msg=" - g 0 0 4294967295" t=2019-08-22T19:01:22-0400 lvl=info msg="Configured LXD uid/gid map:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - u 0 1000000 1000000000" t=2019-08-22T19:01:22-0400 lvl=info msg=" - g 0 1000000 1000000000" t=2019-08-22T19:01:22-0400 lvl=info msg="Kernel features:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - netnsid-based network retrieval: no" t=2019-08-22T19:01:22-0400 lvl=info msg=" - uevent injection: no" t=2019-08-22T19:01:22-0400 lvl=info msg=" - seccomp listener: no" t=2019-08-22T19:01:22-0400 lvl=info msg=" - unprivileged file capabilities: yes" t=2019-08-22T19:01:22-0400 lvl=info msg=" - shiftfs support: no" t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing local database" t=2019-08-22T19:01:22-0400 lvl=info msg="Starting /dev/lxd handler:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2019-08-22T19:01:22-0400 lvl=info msg="REST API daemon:" t=2019-08-22T19:01:22-0400 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket t=2019-08-22T19:01:22-0400 lvl=info msg=" - binding TCP socket" socket=[::]:8443 t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing global database" t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing storage pools" t=2019-08-22T19:01:22-0400 lvl=info msg="Initializing networks" t=2019-08-22T19:01:22-0400 lvl=info msg="Pruning leftover image files" t=2019-08-22T19:01:22-0400 lvl=info msg="Done pruning leftover image files" t=2019-08-22T19:01:22-0400 lvl=info msg="Loading daemon configuration" t=2019-08-22T19:01:22-0400 lvl=info msg="Pruning expired images" t=2019-08-22T19:01:22-0400 lvl=info msg="Done pruning expired images" t=2019-08-22T19:01:22-0400 lvl=info msg="Pruning expired container backups" t=2019-08-22T19:01:22-0400 lvl=info msg="Done pruning expired container backups" t=2019-08-22T19:01:22-0400 lvl=info msg="Updating instance types" t=2019-08-22T19:01:22-0400 lvl=info msg="Expiring log files" t=2019-08-22T19:01:22-0400 lvl=info msg="Done updating instance types" t=2019-08-22T19:01:22-0400 lvl=info msg="Done expiring log files" t=2019-08-22T19:01:22-0400 lvl=info msg="Updating images" t=2019-08-22T19:01:22-0400 lvl=info msg="Done updating images" t=2019-08-22T19:01:22-0400 lvl=eror msg="Error getting source image" err="No such object" fp=47f7d499e98bacbe8a9dae35a8ffa28c9f336624082bdc9e95e322e22d3a16c1 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-06-12T14:20:47-0400 ephemeral=false name=node-red project=default stateful=false used=2019-08-22T18:56:38-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-03-26T09:53:12-0400 ephemeral=false name=mqtt-postgres project=default stateful=false used=2019-08-22T18:49:06-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2017-11-29T17:51:18-0500 ephemeral=false name=cups project=default stateful=false used=2019-08-22T18:48:58-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-03-26T09:55:56-0400 ephemeral=false name=mosquitto-server project=default stateful=false used=2019-08-22T18:49:05-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2018-09-26T15:09:57-0400 ephemeral=false name=haproxy-AD project=default stateful=false used=2019-08-22T17:50:13-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-05-15T13:17:59-0400 ephemeral=false name=gitlab project=default stateful=false used=2019-08-22T18:49:00-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2018-09-27T15:33:26-0400 ephemeral=false name=ha7200bc project=default stateful=false used=2019-08-22T18:49:02-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-05-10T13:41:20-0400 ephemeral=false name=unifi project=default stateful=false used=2019-08-22T18:49:17-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Starting container" action=start created=2019-05-29T13:14:47-0400 ephemeral=false name=bind9 project=default stateful=false used=2019-08-22T18:48:57-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-06-12T14:20:47-0400 ephemeral=false name=node-red project=default stateful=false used=2019-08-22T18:56:38-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2017-11-29T17:51:18-0500 ephemeral=false name=cups project=default stateful=false used=2019-08-22T18:48:58-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-03-26T09:53:12-0400 ephemeral=false name=mqtt-postgres project=default stateful=false used=2019-08-22T18:49:06-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-05-10T13:41:20-0400 ephemeral=false name=unifi project=default stateful=false used=2019-08-22T18:49:17-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-03-26T09:55:56-0400 ephemeral=false name=mosquitto-server project=default stateful=false used=2019-08-22T18:49:05-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-05-15T13:17:59-0400 ephemeral=false name=gitlab project=default stateful=false used=2019-08-22T18:49:00-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2019-05-29T13:14:47-0400 ephemeral=false name=bind9 project=default stateful=false used=2019-08-22T18:48:57-0400 t=2019-08-22T19:02:59-0400 lvl=info msg="Started container" action=start created=2018-09-27T15:33:26-0400 ephemeral=false name=ha7200bc project=default stateful=false used=2019-08-22T18:49:02-0400 t=2019-08-22T19:03:00-0400 lvl=info msg="Started container" action=start created=2018-09-26T15:09:57-0400 ephemeral=false name=haproxy-AD project=default stateful=false used=2019-08-22T17:50:13-0400 t=2019-08-22T19:04:36-0400 lvl=info msg="Starting container" action=start created=2019-06-12T14:15:10-0400 ephemeral=false name=thingsboard project=default stateful=false used=2019-08-22T18:49:12-0400 t=2019-08-22T19:04:37-0400 lvl=info msg="Started container" action=start created=2019-06-12T14:15:10-0400 ephemeral=false name=thingsboard project=default stateful=false used=2019-08-22T18:49:12-0400 t=2019-08-22T19:05:43-0400 lvl=warn msg="Failed to delete operation 018b06a8-e287-4b74-aae8-5fc33d910f53: database is locked" t=2019-08-22T19:12:57-0400 lvl=info msg="Starting container" action=start created=2018-03-07T15:29:20-0500 ephemeral=false name=ldap project=default stateful=false used=2019-08-22T18:49:03-0400 t=2019-08-22T19:13:40-0400 lvl=eror msg="Failed starting container" action=start created=2018-03-07T15:29:20-0500 ephemeral=false name=ldap project=default stateful=false used=2019-08-22T18:49:03-0400 t=2019-08-22T19:15:55-0400 lvl=info msg="Received 'terminated signal', exiting" t=2019-08-22T19:15:55-0400 lvl=info msg="Starting shutdown sequence" t=2019-08-22T19:15:55-0400 lvl=info msg="Stopping REST API handler:" t=2019-08-22T19:15:55-0400 lvl=info msg=" - closing socket" socket=[::]:8443 t=2019-08-22T19:15:55-0400 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2019-08-22T19:15:55-0400 lvl=info msg="Stopping /dev/lxd handler:" t=2019-08-22T19:15:55-0400 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2019-08-22T19:15:55-0400 lvl=info msg="Closing the database"

truelai commented 5 years ago

I removed the source image it was complaining about.

Also, I was able to prevent the locks by turning off one container running HomeAssistant. Any idea why that would be?

stgraber commented 5 years ago

I'm not seeing anything odd in there.

You could try doing lxc config set NAME raw.lxc lxc.log.level=trace on the container and then attempt starting it again which will produce much more output in lxc info NAME --show-log.

One potential issue is you running out of resources of some kind on your system, be that memory, processes, files, ... You may want to look at dmesg, make sure you have enough memory (free -m) and aren't running out of disk space (df -h) or inodes (df -i).

stgraber commented 5 years ago

You could also try switching to the HWE kernel which would get you onto the same 4.15 kernel as Ubuntu 18.04, especially if anything odd comes up from looking at dmesg.

stgraber commented 5 years ago

@truelai how are things going?

truelai commented 5 years ago

@stgraber - I upgraded to 18.04 and looks like the kernel had some bug fixes for btrfs. Since then, the issue has not repeated so it may have been related. I'd call this closed for now. Thanks for checking in!

Also, if you're in Montreal, please shoot me an email. I owe you food and drink.