Closed samvde closed 6 years ago
The upgrade to 3.0 moves most of the DB data from lxd.db over to raft/ so that's expected. lxd.db then just stores a partial schema with node-specific information.
So the fact that lxd.db
appears to shrink when upgrading to 3.0 shouldn't be a worry.
To try and get things straight, can you:
If all goes well, that last one will hang, indicating that LXD is running, if that's the case, the try to run lxc list
from a separate terminal (without interrupting the one running the lxd daemon).
Hi Stéphane
I logged all output into this file: lxd.log-a.txt
The procedure fails. Relevant error:
DBUG[04-29|21:25:02] Initializing a ZFS driver. DBUG[04-29|21:25:02] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:02] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:03] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:03] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:03] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:03] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:03] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:04] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:04] Database error: &errors.errorString{s:"sql: no rows in result set"} DBUG[04-29|21:25:04] Database error: &errors.errorString{s:"sql: no rows in result set"} EROR[04-29|21:25:04] Failed to clear old profile configuration for profile default: no such table: profiles_config. EROR[04-29|21:25:04] Failed to clear old profile configuration for profile docker: no such table: profiles_config. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:04] Initializing a ZFS driver. DBUG[04-29|21:25:05] Initializing a ZFS driver. DBUG[04-29|21:25:05] Initializing a ZFS driver. DBUG[04-29|21:25:05] Initializing and checking storage pool "default". DBUG[04-29|21:25:05] Initializing a ZFS driver. DBUG[04-29|21:25:05] Checking ZFS storage pool "default". DBUG[04-29|21:25:05] ZFS storage pool "default" does not exist. Trying to import it. EROR[04-29|21:25:05] Failed to start the daemon: ZFS storage pool "default" could not be imported: cannot import 'default': no such pool available
I don't know what to make of the database errors, but the zfs error corresponds to the error from my initial post. It seems it wants to import a zfs pool called default, which I don't have. I used an existing zpool for LXD and if I recall well, the LXD storage pool was called default.
It's correct in the original database:
I have no idea how to query or look at the contents of db.bin.
I also tried using a version of lxd.db from 2 weeks ago, it has the same same error output.
I can keep this machine in this state for troubleshooting for the time being.
Ok, so that error actually makes sense and it's something I fixed already.
Please do:
This will effectively do the same as before but this time with a LXD that will set the right pool name.
Ok, that worked :-) After this procedure I was able to start the lxd.service as well. It seems the correct container states are restored and all are running.
Here's the log file (sorry for the bad layout, I should have logged printable output only): lxd-a.log
Should I be cautious for upgrades to LXD for the time being or is the core issue fixed? If you want I can provide you with the lxd.db.bak file if you would need to reproduce this?
BTW: very impressive support here, many thanks!
That machine is now fine, it's a one-time step that was failing so you won't ever run that code again. If you have other machines with a similar zfs configuration, then they'll likely hit the same problem and will also need the custom binary until we push the next package update with it included.
Required information
Issue description
I have upgrade a home NAS to Bionic using 'do-release-upgrade -d' which apparently upgrades LXD from version 2 to 3 as well. This system is a fileserver, and it has a few non-critical containers (which is the sole reason I haven't rolled back to before the upgrade).
I followed the standard upgrade procedure. After upgrading, LXD is completely non-functional for me.
I see 2 big issues:
lxc-containers status:
I have not been able to troubleshoot the "Error: Get http://unix.socket/1.0: EOF" error. As I noticed this sometimes is related to user permissions: these did not change, and I have tried relevant commands as my own user as well as root.
Current permissions (removed 'x' from output):
lxd.socket status:
lxd.service status:
I'm able to run sudo lxd --group lxd --verbose --debug:
Then this basically repeats.
After the upgrade, I noticed the original lxd.db file was replaced with a new one:
I can confirm lxd.db.bak is de one I had before. I have tried restarting everything with the original lxd.db file, which failed.
It's clear all of my config has been lost when looking at the db files using sqlitebrowser. A screenshot comparing the two:
So I'm quite stuck here. All zfs datasets are mounted where they should be mounted, everything seems fine from that perspective:
dmesg
)no LXD entries, I do see LXC entries:
At start:
Then repeating:
[ ] Main daemon log (at /var/log/lxd/lxd.log lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
lxc monitor
while reproducing the issue)