Closed sbworth closed 7 years ago
OK. Thanks for the clarification. I was hoping to resolve this tonight, but I have run out of gas. More tomorrow.
Did you get to run the last binary I gave you? That should at least tell us what command is returning that error.
I didn't run it. I have discovered one container that does not have a LV as backing store, despite the fact that it was created recently. I was going to investigate a bit more. Is it possible that one of the other minor releases in the 2.8 series had a bug that could have resulted in a dir backing store instead of an LV? Is it possible for an individual container to have a container-specific setting that differs from the default? (I certainly did not do it on purpose.) I think that container is what stopped LXD in the last run, because it is alphabetically next after the successful sequence; so we may be very close.
I can't remember any bug in the LVM code which would have caused LXC to create a directory backed container rather than LVM when the storage.lvm_vg_name key is set.
It's the kind of thing that could happen with the btrfs happen because LXD would silently fallback if btrfs isn't detected anymore, but for LVM we only check the storage key...
You could probably cause this by creating a directory container and then switching to LVM by setting the config key. Will try to reproduce. In any case, we might need to handle this corner case as well and revert dir-backed containers detected during upgrade to LVM-backed containers.
Actually, it might be possible to create these mixed-storage instances in any {LVM,<valid-storage-type>}
combination. Switching from LVM back should not be possible since LXD will refuse.
Yup, reproduced creating a zfs pool first, creating a container and then switching to lvm on a 2.8
instance.
I've got code coming that will handle this.
@brauner I am afraid that it is even stranger than that, as the non-LVM container was clearly created long after many of the LVM containers. My best guess is that I created it on my other LXD server, which has only dir storage and migrated it to the LVM-backed server. Normally, this resulted in an LVM server on the receiving end, but in this one case I ended up with a dir server.
In preparation for the multi-storage support, was there a release in which you moved the storage setting from a global variable to a per-container variable, which might have been picked up -- and implemented -- by my LVM-backed server, but ALSO preceding the introduction of the new storage-pools area?
Since dir storage was the original, and simplest, storage format, my LVM-backed server, if it were at the exact right version when receiving the "lxc move foo sys2:foo", could have implemented the container-specific storage type by simply writing directly into /var/lib/lxd/containers/foo/rootfs
.
Possible?
Strike my last comment. Looking inside the sqlite tables, we see that the creation date timestamp was 1480368405
which converts to 2016-11-28; so it is in fact one of the oldest containers and so probably predates the LVM conversion. It is expendable; so I am going to move it aside, delete the entries from the lxd.db and try @stgraber's latest patched LXD.
well, I'm sending a patch that handles such upgrades
Note, that moving a DIR container to a LVM container will not set LV Thin origin name
which you can see in lvdisplay
output. This however, shouldn't be a problem in terms of functionality.
I just ran the last LXD from @stgraber. It migrated all of the containers and renamed the volumes to containers_{foo}. It eventually bombed out on migrating the images, but I think that that may again be due to residue left behind by the pre-LVM storage.
WARN[03-07|09:58:29] Database already contains a valid entry for the storage pool: lxd.
WARN[03-07|09:58:29] Storage volumes database already contains an entry for the container.
WARN[03-07|09:58:29] Storage volumes database already contains an entry for the container.
WARN[03-07|09:58:29] Storage volumes database already contains an entry for the container.
WARN[03-07|09:58:29] Storage volumes database already contains an entry for the container.
WARN[03-07|09:58:30] Storage volumes database already contains an entry for the container.
WARN[03-07|09:58:30] Storage volumes database already contains an entry for the container.
error: Failed to run: lvrename lxd b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb images_b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb: Existing logical volume "b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb" not found in volume group "lxd"
These new image volumes were created (names truncated for presentation):
LV VG Attr LSize Pool Origin Data% Meta%
images_11fc...d172 lxd Vwi-a-tz-- 300.00g LXDPool 1.85
images_18e7...5533 lxd Vwi-a-tz-- 10.00g LXDPool 8.38
images_457a...6595 lxd Vwi-a-tz-- 10.00g LXDPool 5.65
images_543e...3d15 lxd Vwi-a-tz-- 10.00g LXDPool 8.59
images_a570...e869 lxd Vwi-a-tz-- 10.00g LXDPool 13.85
and reflected in /var/lib/lxd/storage-pools/lxd/images
as a bunch of empty directories:
root@sys2:/var/lib/lxd/storage-pools/lxd# ls -l images
total 24
drwx------ 2 root root 4096 Mar 7 09:58 11fc...d172
drwx------ 2 root root 4096 Mar 7 09:58 18e7...5533
drwx------ 2 root root 4096 Mar 7 09:58 457a...6595
drwx------ 2 root root 4096 Mar 7 09:58 543e...3d15
drwx------ 2 root root 4096 Mar 7 09:58 a570...3e869
drwx------ 2 root root 4096 Mar 7 09:58 b5b0...52abb
My /var/lib/lxd/images
looks like it is full of cruft. (Again filenames truncated for presentation.):
root@sys2:/var/lib/lxd/images# ls -l
total 1448392
-rw-r--r-- 1 root root 844 Mar 3 21:37 11fc...d172
-rw-r--r-- 1 root root 153494344 Mar 3 21:40 11fc...d172.rootfs
-rw-r--r-- 1 root root 199032954 Jan 10 10:45 18e7...5533
-rw-r--r-- 1 root root 600 Jan 9 13:47 457a...6595
-rw-r--r-- 1 root root 98614308 Jan 9 13:47 457a...6595.rootfs
-rw-r--r-- 1 root root 596 Nov 24 23:36 543e...3d15
-rw-r--r-- 1 root root 130996352 Nov 24 23:36 543e...3d15.rootfs
-rw-r--r-- 1 root root 418665102 Nov 30 15:05 a570...e869
-rw-r--r-- 1 root root 816 Nov 24 15:42 b5b0...2abb
-rw-r--r-- 1 root root 125798024 Nov 24 15:42 b5b0...2abb.rootfs
-rw-r--r-- 1 root root 600 Nov 24 23:40 bfd1...f9f8
-rw-r--r-- 1 root root 2487084 Nov 24 23:40 bfd1...f9f8.rootfs
-rw-r--r-- 1 root root 596 Jan 7 17:51 d7c1...3cc6
lrwxrwxrwx 1 root root 73 Jan 7 17:51 d7c1...3cc6.lv -> /dev/lxd/d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6
-rw-r--r-- 1 root root 2756944 Jan 7 17:51 d7c1...3cc6.rootfs
-rw-r--r-- 1 root root 285455980 Feb 14 13:28 e12c...eeed
lrwxrwxrwx 1 root root 73 Feb 14 13:28 e12c...eeed.lv -> /dev/lxd/e12c3c1aed259ce62b4a5e8dc5fe8b92d14d36e611b3beae3f55c94df069eeed
-rw-r--r-- 1 root root 596 Nov 24 23:39 ff52...3ac28
-rw-r--r-- 1 root root 65777664 Nov 24 23:39 ff52...3ac28.rootfs
I'm thinking that I may just need to move aside all of the images that have no reference to a logical volume and then run LXD again. Does that seem plausible to you guys?
Here is some formatted data from lxd.db images tables:
sqlite> select id,cached,fingerprint,filename,creation_date,last_use_date from images;
| id | cached | fingerprint | filename | creation_date | last_use_date |
| 2 | 0 | b5b0...2abb | ubuntu-14.04-server-cloudimg-amd64-lxd.tar.xz | 2016-11-09 00:00:00+00:00 | |
| 3 | 0 | 543e...3d15 | lxd.tar.xz | 2016-11-25 00:00:00+00:00 | 2016-11-30 21:37:31.630646328+00:00 |
| 4 | 0 | ff52...ac28 | lxd.tar.xz | 2016-11-25 00:00:00+00:00 | |
| 8 | 0 | a570...e869 | | 0001-01-01 00:00:00+00:00 | |
| 50 | 0 | d7c1...3cc6 | lxd.tar.xz | 2017-01-07 00:00:00+00:00 | 2017-01-13 19:34:34.399231656+00:00 |
| 51 | 0 | 457a...6595 | lxd.tar.xz | 2017-01-08 00:00:00+00:00 | 2017-01-09 18:48:43.133706794+00:00 |
| 52 | 1 | 18e7...5533 | | 0001-01-01 00:00:00+00:00 | 2017-01-10 15:45:22.877654385+00:00 |
| 55 | 0 | e12c...eeed | | 0001-01-01 00:00:00+00:00 | 2017-02-14 18:33:03.41803276+00:00 |
| 60 | 0 | 11fc...d172 | ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz | 2017-03-03 00:00:00+00:00 | 2017-03-04 03:47:50.051114525+00:00 |
Finally, the contents of lxd.db table storage_volumes. Note that the images all come at the end with the partially obfuscated containers above. I also see references to containers that were deleted (with foreign_keys=ON):
id|name|storage_pool_id|type
1|a.....3|1|0
2|a...4|1|0
3|a.......d|1|0
4|a.....w|1|0
5|a......t|1|0
6|b.....r|1|0
7|c.............e|1|0
8|d...|1|0
9|e......t|1|0
10|f......5|1|0
11|foo|1|0
12|h........w|1|0
13|h........w|1|0
14|j....2|1|0
15|l.........e|1|0
16|l....2|1|0
17|..0|1|0
18|..1|1|0
19|.......2|1|0
20|.......4|1|0
21|p......p|1|0
22|p........l|1|0
23|p.....t|1|0
24|p...........2|1|0
25|p..|1|0
26|s.............4|1|0
27|s...........e|1|0
28|test1|1|0
29|test2|1|0
30|u.....s|1|0
31|v...t|1|0
32|v....1|1|0
33|v...|1|0
34|w.....k|1|0
35|11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172|1|1
36|18e7ed74d0d653894f65343afbc35b92c6781933c273943d882c36a5c5535533|1|1
37|457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595|1|1
38|543e662b70958f5b87f68b20eb0a205d8c4b14c41f80699e9a98b3b851883d15|1|1
39|a570ce23e1dae791e7b8b2f2bcb98c1404273e97c7a1fb972bf0f5835ac3e869|1|1
40|b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb|1|1
OK. Enough for now.
Do
/dev/lxd/d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6
/dev/lxd/e12c3c1aed259ce62b4a5e8dc5fe8b92d14d36e611b3beae3f55c94df069eeed
exist? If not, you can delete them and re-run the upgrade. The image directory itself should only contain plain files of the format <sha256>
and <sha256>.rootf
. These should be kept around!Ah, I see the upgrade failure may well be due, as you correctly observed, that the image used to create the dir container is still around and does not exist as a LVM logical volume. The new upgrade code for mixed-storage LXD instances should handle this.
Yes, those images exist and are dependencies of some containers. I was just putting together an orderly table view of the output of lvs:
| LV | VG | Attr | LSize | Pool | Origin | Data% |
|-----------------------------+-----+--------------------+---------+--------------------------+--------+-------|
| LXDPool | lxd | twi-aotz-- 2.56t | | | 3.92 | 2.13 |
| containers_......3 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 21.34 | |
| containers_....4 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 53.18 | |
| containers_.......w | lxd | Vwi-aotz-- 10.00g | LXDPool | images_18e7...5533 42.01 | | |
| containers_.......t | lxd | Vwi-aotz-- 10.00g | LXDPool | | 10.28 | |
| containers_......r | lxd | Vwi-aotz-- 10.00g | LXDPool | | 13.78 | |
| containers_...............e | lxd | Vwi-aotz-- 10.00g | LXDPool | images_543e...3d15 8.77 | | |
| containers_.......t | lxd | Vwi-aotz-- 10.00g | LXDPool | | 16.87 | |
| containers_.......5 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 31.73 | |
| containers_... | lxd | Vwi-aotz-- 300.00g | LXDPool | images_11fc...d172 1.85 | | |
| containers_..........w | lxd | Vwi-aotz-- 10.00g | LXDPool | | 19.22 | |
| containers_..........w | lxd | Vwi-aotz-- 10.00g | LXDPool | | 11.45 | |
| containers_.....2 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 9.85 | |
| containers_...........e | lxd | Vwi-aotz-- 10.00g | LXDPool | | 40.40 | |
| containers_.....2 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 98.77 | |
| containers_..0 | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.94 | |
| containers_..1 | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.93 | |
| containers_.......2 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 11.84 | |
| containers_.......4 | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.88 | |
| containers_........p | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.85 | |
| containers_..........l | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.90 | |
| containers_......t | lxd | Vwi-aotz-- 10.00g | LXDPool | images_543e...3d15 8.59 | | |
| containers_.............2 | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.95 | |
| containers_..b | lxd | Vwi-aotz-- 300.00g | LXDPool | | 2.11 | |
| containers_...............4 | lxd | Vwi-aotz-- 10.00g | LXDPool | d7c1...3cc6 3.42 | | |
| containers_.............e | lxd | Vwi-aotz-- 10.00g | LXDPool | images_457a...6595 5.71 | | |
| containers_....1 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 12.15 | |
| containers_....2 | lxd | Vwi-aotz-- 10.00g | LXDPool | | 9.62 | |
| containers_......s | lxd | Vwi-aotz-- 10.00g | LXDPool | | 69.05 | |
| containers_....t | lxd | Vwi-aotz-- 10.00g | LXDPool | | 12.34 | |
| containers_.....1 | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.85 | |
| containers_...1 | lxd | Vwi-aotz-- 300.00g | LXDPool | | 1.88 | |
| containers_......k | lxd | Vwi-aotz-- 10.00g | LXDPool | | 42.80 | |
| d7c1...3cc6 | lxd | Vwi-a-tz-- 10.00g | LXDPool | | 2.95 | |
| e12c...eeed | lxd | Vwi-a-tz-- 300.00g | LXDPool | | 1.89 | |
| f..........1 | lxd | Vwi-aotz-- 40.00g | LXDPool | | 34.53 | |
| images_11fc...d172 | lxd | Vwi-a-tz-- 300.00g | LXDPool | | 1.85 | |
| images_18e7...5533 | lxd | Vwi-a-tz-- 10.00g | LXDPool | | 8.38 | |
| images_457a...6595 | lxd | Vwi-a-tz-- 10.00g | LXDPool | | 5.65 | |
| images_543e...3d15 | lxd | Vwi-a-tz-- 10.00g | LXDPool | | 8.59 | |
| images_a570...e869 | lxd | Vwi-a-tz-- 10.00g | LXDPool | | 13.85 | |
| lphys-home | lxd | Vwi-aotz-- 400.00g | LXDPool | | 2.82 | |
Cool, yeah those other image failures are caused by them not being present as LVM logical volumes since they were used to create dir containers. Thanks for your help!
I'm glad that my pain could be of value. ;-)
If I see:
-rw-r--r-- 1 root root 596 Jan 7 17:51 d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6
lrwxrwxrwx 1 root root 73 Jan 7 17:51 d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6.lv -> /dev/lxd/d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6
-rw-r--r-- 1 root root 2756944 Jan 7 17:51 d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6.rootfs
and absent any dir-backed containers, are the plain file versions just leftovers that we can safely clear away. (Perhaps LXD should do the clearing to ensure database consistency, but I am thinking more narrowly of not missing any container dependencies.)
That depends on whether the corresponding image has an entry in the images database. If that is the case then simply deleting the image from the folder will leave you with an entry for this image in the db. As soon as @stgraber is around he can provide you with a pre-built binary based on my patches that will likely take care of this issue. If you can't wait that long, then do:
/var/lib/lxd/images/<sha256>{.rootfs}
and the corresponding entries in the dbI am content to wait for now for the @stgraber updated LXD. I simply want to understand the dependencies as well as possible for future reference. I'm going to create a new issue with a suggestion for tweaking your handling of image/container updates with safety in mind. (You guys, of course, get to decide whether it make sense.)
Sorry about that, worked till 3am so woke up pretty late today :) Will build you a new binary now.
@sbworth
Binary updated: https://dl.stgraber.org/lxd-3026 sha256: ad96813bb5ecc29dde483b48ee284682df3f128d7b1006f2c313300585970bdf
@stgraber We have success.
Should I continue to run this lxd, or should the 2.10.1 version now suffice until the next upgrade?
SInce the changes we made were limited to the upgrade code and you've now gotten past that upgrade, you can resume using 2.10.1. We'll have LXD 2.11 out later today with the fix.
And merged that last batch of fixes from @brauner so all the fixes needed to sort out this issue are now in the master branch, closing this issue.
Now running successfully with 2.10.1 lxd in post-migration state.
Thanks again @stgraber @brauner .
Hello,
I believe that I have a variant of the problem seen in issue #3024 which I have been following with interest. After upgrade to 2.10.1 from 2.8.x, lxd cannot start up.
Required information
Issue description
I have two systems, sys1 and sys2. Sys1 is using dir storage, while sys2 is using LVM.
With sys1, I migrated from 2.8.x to 2.9.x and then to 2.10.x. After resolving an issue with a change in profile inheritance of the disk device after the 2.9.x upgrade, sys1 seems to have upgraded to 2.10.x ok.
With sys2, I migrated directly from 2.8.x to 2.10.x. This was inadvertent, as I had just sorted out the 2.9.x issue on sys1 and intended to move sys2 to 2.9.x. When lxd attempted to restart, the lxc command line client stopped responding.
Checking /var/log/lxd/lxd.log, we see:
journalctl -u lxd
Sample of
/var/lib/lxd/containers
:File tree listing of
/var/lib/lxd/storage-pools
:That is, the storage-pools area is empty. (Were the container rootfs links supposed to be migrated to
storage-pools
?)The images area seems untouched:
Output from
pvs
andvgs
and -- highly edited for readability -- output fromlvs
:Data from lxd.db:
It looks somewhat odd to me that host astro3 has an entry in the storage_volumes tables when nothing else does. It does differ in being a privileged container.
Any help you can provide to get regular access restored will be greatly appreciated. For the moment, the containers continue to provide their services. Let me know if I can provide any other useful data or perform any non-destructive tests.