canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 930 forks source link

[Upgrade from mixed-storage LXD] LXD won't restart after upgrade to 2.10.1 #3026

Closed sbworth closed 7 years ago

sbworth commented 7 years ago

Hello,

I believe that I have a variant of the problem seen in issue #3024 which I have been following with interest. After upgrade to 2.10.1 from 2.8.x, lxd cannot start up.

Required information

Issue description

I have two systems, sys1 and sys2. Sys1 is using dir storage, while sys2 is using LVM.

With sys1, I migrated from 2.8.x to 2.9.x and then to 2.10.x. After resolving an issue with a change in profile inheritance of the disk device after the 2.9.x upgrade, sys1 seems to have upgraded to 2.10.x ok.

With sys2, I migrated directly from 2.8.x to 2.10.x. This was inadvertent, as I had just sorted out the 2.9.x issue on sys1 and intended to move sys2 to 2.9.x. When lxd attempted to restart, the lxc command line client stopped responding.

Checking /var/log/lxd/lxd.log, we see:

lvl=info msg="LXD 2.10.1 is starting in normal mode" path=/var/lib/lxd t=2017-03-06T14:34:02-0500
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2017-03-06T14:34:02-0500
lvl=info msg="Kernel uid/gid map:" t=2017-03-06T14:34:02-0500
lvl=info msg=" - u 0 0 4294967295" t=2017-03-06T14:34:02-0500
lvl=info msg=" - g 0 0 4294967295" t=2017-03-06T14:34:02-0500
lvl=info msg="Configured LXD uid/gid map:" t=2017-03-06T14:34:02-0500
lvl=info msg=" - u 0 100000 65536" t=2017-03-06T14:34:02-0500
lvl=info msg=" - g 0 100000 65536" t=2017-03-06T14:34:02-0500
lvl=warn msg="Database already contains a valid entry for the storage pool: lxd." t=2017-03-06T14:34:03-0500
lvl=warn msg="Storage volumes database already contains an entry for the container." t=2017-03-06T14:34:03-0500
lvl=info msg="LXD 2.10.1 is starting in normal mode" path=/var/lib/lxd t=2017-03-06T14:44:02-0500
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2017-03-06T14:44:02-0500
lvl=info msg="Kernel uid/gid map:" t=2017-03-06T14:44:02-0500
lvl=info msg=" - u 0 0 4294967295" t=2017-03-06T14:44:02-0500
lvl=info msg=" - g 0 0 4294967295" t=2017-03-06T14:44:02-0500
lvl=info msg="Configured LXD uid/gid map:" t=2017-03-06T14:44:02-0500
lvl=info msg=" - u 0 100000 65536" t=2017-03-06T14:44:02-0500
lvl=info msg=" - g 0 100000 65536" t=2017-03-06T14:44:02-0500
lvl=warn msg="Database already contains a valid entry for the storage pool: lxd." t=2017-03-06T14:44:03-0500
lvl=warn msg="Storage volumes database already contains an entry for the container." t=2017-03-06T14:44:03-0500

journalctl -u lxd

Mar 06 14:34:02 sys2 systemd[1]: Starting LXD - main daemon...
Mar 06 14:34:02 sys2 lxd[4416]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2017-03-06T14:34:02-0500
Mar 06 14:34:03 sys2 lxd[4416]: lvl=warn msg="Database already contains a valid entry for the storage pool: lxd." t=2017-03-06T14:34:03-0500
Mar 06 14:34:03 sys2 lxd[4416]: lvl=warn msg="Storage volumes database already contains an entry for the container." t=2017-03-06T14:34:03-0500
Mar 06 14:34:13 sys2 lxd[4416]: error: device or resource busy
Mar 06 14:34:13 sys2 systemd[1]: lxd.service: Main process exited, code=exited, status=1/FAILURE
Mar 06 14:44:02 sys2 lxd[4417]: error: LXD still not running after 600s timeout.
Mar 06 14:44:02 sys2 systemd[1]: lxd.service: Control process exited, code=exited status=1
Mar 06 14:44:02 sys2 systemd[1]: Failed to start LXD - main daemon.
Mar 06 14:44:02 sys2 systemd[1]: lxd.service: Unit entered failed state.
Mar 06 14:44:02 sys2 systemd[1]: lxd.service: Failed with result 'exit-code'.
Mar 06 14:44:02 sys2 systemd[1]: lxd.service: Service hold-off time over, scheduling restart.
Mar 06 14:44:02 sys2 systemd[1]: Stopped LXD - main daemon.
Mar 06 14:44:02 sys2 systemd[1]: Starting LXD - main daemon...
Mar 06 14:44:02 sys2 lxd[8637]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2017-03-06T14:44:02-0500
Mar 06 14:44:03 sys2 lxd[8637]: lvl=warn msg="Database already contains a valid entry for the storage pool: lxd." t=2017-03-06T14:44:03-0500
Mar 06 14:44:03 sys2 lxd[8637]: lvl=warn msg="Storage volumes database already contains an entry for the container." t=2017-03-06T14:44:03-0500
Mar 06 14:44:13 sys2 lxd[8637]: error: device or resource busy
Mar 06 14:44:13 sys2 systemd[1]: lxd.service: Main process exited, code=exited, status=1/FAILURE

Sample of /var/lib/lxd/containers:

drwx------+ 5 root   root    4096 Jan  2 15:59 astro3
lrwxrwxrwx  1 root   root      16 Jan  2 11:35 astro3.lv -> /dev/lxd/astro3
drwxr-xr-x+ 5 root   root    4096 Jan  9 12:13 vault
lrwxrwxrwx  1 root   root      14 Jan 11 15:01 vault.lv -> /dev/lxd/vault
drwx------  2 root   root    4096 Feb 15 13:48 vgate1
lrwxrwxrwx  1 root   root      15 Feb 15 13:48 vgate1.lv -> /dev/lxd/vgate1
drwxr-xr-x+ 5 root   root    4096 Jan 18 13:52 vpn1
lrwxrwxrwx  1 root   root      13 Jan 25 16:22 vpn1.lv -> /dev/lxd/vpn1

File tree listing of /var/lib/lxd/storage-pools:

.
└── lxd
    └── containers

That is, the storage-pools area is empty. (Were the container rootfs links supposed to be migrated to storage-pools?)

The images area seems untouched:

root@sol2:/var/lib/lxd/containers# ls /var/lib/lxd/images/
11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172
11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172.lv
11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172.rootfs
18e7ed74d0d653894f65343afbc35b92c6781933c273943d882c36a5c5535533
18e7ed74d0d653894f65343afbc35b92c6781933c273943d882c36a5c5535533.lv
457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595
457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595.lv
457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595.rootfs
543e662b70958f5b87f68b20eb0a205d8c4b14c41f80699e9a98b3b851883d15
543e662b70958f5b87f68b20eb0a205d8c4b14c41f80699e9a98b3b851883d15.lv
543e662b70958f5b87f68b20eb0a205d8c4b14c41f80699e9a98b3b851883d15.rootfs
a570ce23e1dae791e7b8b2f2bcb98c1404273e97c7a1fb972bf0f5835ac3e869
a570ce23e1dae791e7b8b2f2bcb98c1404273e97c7a1fb972bf0f5835ac3e869.lv
b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb
b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.rootfs
bfd17410a8c7fe6397dba3e353a23001243bc43af87acf25544d6b0ab624f9f8
bfd17410a8c7fe6397dba3e353a23001243bc43af87acf25544d6b0ab624f9f8.rootfs
d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6
d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6.lv
d7c16c4fedd3308b5bffdb91f491b8458610c6115d37ace8ba4bcf5c29b23cc6.rootfs
e12c3c1aed259ce62b4a5e8dc5fe8b92d14d36e611b3beae3f55c94df069eeed
e12c3c1aed259ce62b4a5e8dc5fe8b92d14d36e611b3beae3f55c94df069eeed.lv
ff52f536d2896f358bc913d592828ecf1b39fae45e4ee4825930091e8793ac28
ff52f536d2896f358bc913d592828ecf1b39fae45e4ee4825930091e8793ac28.rootfs

Output from pvs and vgs and -- highly edited for readability -- output from lvs:

  PV         VG   Fmt  Attr PSize   PFree  
  /dev/sda5  dat1 lvm2 a--  931.13g 181.13g
  /dev/sda6  lxd  lvm2 a--    2.56t      0 
  VG   #PV #LV #SN Attr   VSize   VFree  
  dat1   1   1   0 wz--n- 931.13g 181.13g
  lxd    1  42   0 wz--n-   2.56t      0 
 LV        VG   Attr       LSize   Pool   Origin   Data%  Meta%
 LXDPool  lxd  twi-aotz--   2.56t                   3.91   2.12
 astro3   lxd  Vwi-aotz--  10.00g LXDPool          20.69 
 vault    lxd  Vwi-aotz--  10.00g LXDPool          12.34
 vgate1   lxd  Vwi-a-tz-- 300.00g LXDPool           1.85
 vpn1     lxd  Vwi-aotz-- 300.00g LXDPool           1.88

Data from lxd.db:

sqlite> select * from storage_pools;
1|lxd|lvm
sqlite> select * from storage_pools_config;
166|1|volume.size|300GB
167|1|size|21GB
168|1|source|lxd
169|1|lvm.thinpool_name|LXDPool
170|1|lvm.vg_name|lxd
sqlite> select * from storage_volumes;
1|astro3|1|0
sqlite> select * from storage_volumes_config;
67|1|block.filesystem|ext4
68|1|size|300GB

It looks somewhat odd to me that host astro3 has an entry in the storage_volumes tables when nothing else does. It does differ in being a privileged container.

Any help you can provide to get regular access restored will be greatly appreciated. For the moment, the containers continue to provide their services. Let me know if I can provide any other useful data or perform any non-destructive tests.

stgraber commented 7 years ago

The error in your case appears to be "error: device or resource busy". I don't know if @brauner has an idea of what may be causing this on LVM and if we can work around that issue too.

The reason why you have that container listed but not the others is simply because it's the first in alphabetical order. It may be that container which failed to get converted or the next one, I'm not sure.

@brauner did you do LVM upgrade tests from 2.8 to 2.10.1 with running containers? I wonder if it's just an issue of a mountpoint needing some convincing.

brauner commented 7 years ago

I'm pretty sure that this might again be solved by using lazy umounts.

stgraber commented 7 years ago

There is a very good chance that rebooting your system will fix the issue as all containers would then get stopped, all mounts undone and so give LXD a clean environment to upgrade on reboot. But if you're willing to wait a bit we can try to fix this for running containers and give you an updated lxd binary to try.

brauner commented 7 years ago

And yes, I've run upgrade tests from 2.8. to 2.10.1 with LVM: https://asciinema.org/a/104652

stgraber commented 7 years ago

@sbworth

https://dl.stgraber.org/lxd-3026 sha256: 036428ed214b786c771cfa64dabf888be492a44ec3763607c9656bdc46d850f0

To run:

systemctl stop lxd.service lxd.socket
./lxd --debug --group lxd
sbworth commented 7 years ago

Will try as soon as I can get to a console.

sbworth commented 7 years ago

Is lxd-3026 intended to fix the problem or just provide diagnostics?

stgraber commented 7 years ago

It should fix the problem, or at least get past the error we saw earlier.

What's the output?

brauner commented 7 years ago

I suspect that the point of failure is that the umount didn't succeed because something is holding the mountpoint busy. If that's the case @stgraber's binary should fix this problem.

sbworth commented 7 years ago

Sorry, but I will have to put off testing until I have the chance to warn some users, which will likely mean putting off the actual test until tomorrow morning (EST). Thank you for your very rapid response.

stgraber commented 7 years ago

@sbworth Note that the upgrade will not stop any running containers, so assuming that your LXD daemon is currently offline, it really can't make things any worse.

sbworth commented 7 years ago

OK. Will try momentarily.

sbworth commented 7 years ago
# ./lxd --debug --group lxd
INFO[03-06|16:48:08] LXD 2.10.1 is starting in normal mode    path=/var/lib/lxd
WARN[03-06|16:48:08] CGroup memory swap accounting is disabled, swap limits will be ignored. 
INFO[03-06|16:48:08] Kernel uid/gid map: 
INFO[03-06|16:48:08]  - u 0 0 4294967295 
INFO[03-06|16:48:08]  - g 0 0 4294967295 
INFO[03-06|16:48:08] Configured LXD uid/gid map: 
INFO[03-06|16:48:08]  - u 0 100000 65536 
INFO[03-06|16:48:08]  - g 0 100000 65536 
DBUG[03-06|16:48:08] Initializing and checking storage pool "lxd". 
DBUG[03-06|16:48:08] Initializing an LVM driver. 
DBUG[03-06|16:48:08] Checking LVM storage pool "lxd". 
DBUG[03-06|16:48:08] Checked LVM storage pool "lxd". 
DBUG[03-06|16:48:08] Applying patch: storage_api 
WARN[03-06|16:48:08] Database already contains a valid entry for the storage pool: lxd. 
WARN[03-06|16:48:08] Storage volumes database already contains an entry for the container. 
error: exit status 5
brauner commented 7 years ago

Exit status 5 means that either the physical volume, volume group or logical volume is not found.

sbworth commented 7 years ago

/var/lib/lxd/storage-pools/lxd/containers now contains:

drwx------+ 5 root   root   4096 Jan  2 15:59 apollo3
drwxr-xr-x+ 5 100000 100000 4096 Jan  3 13:28 apps4
drwxr-xr-x+ 4 100000 100000 4096 Nov 24 11:33 apps4-old
sbworth commented 7 years ago

I have no container apps4-old; so I am rather confused. Investigating contents.

.
├── apollo3
│   ├── backup.yaml
│   ├── lost+found
│   ├── metadata.yaml
│   ├── rootfs
│   └── templates
├── apps4
│   ├── backup.yaml
│   ├── lost+found
│   ├── metadata.yaml
│   ├── rootfs
│   └── templates
└── apps4-old
    ├── metadata.yaml
    ├── rootfs
    └── templates

Could it be picking up old data from a previous migration to LVM from dir storage?

stgraber commented 7 years ago

Hmm, that's confusing, I wonder if you have that "apps4-old" container in the database but it wasn't showing up due to broken storage.

What does this give you:

sqlite3 /var/lib/lxd/lxd.db "SELECT * FROM containers;"

(You'll need to install the "sqlite3" package if you don't already have it installed)

sbworth commented 7 years ago

First line in the table:

3|apps4-old|2|0|0|0|1480005211|2016-11-30 14:52:15.442158778+00:00

No evidence of it in /var/lib/lxd/containers.

Looking for other strays.

stgraber commented 7 years ago

Ok, so that explains the migration problem. Are you seeing any other such containers in the database list?

stgraber commented 7 years ago

If not, the following should do the trick to get rid of it from the database, unblocking the migration:

sqlite3 /var/lib/lxd/lxd.db
  PRAGMA foreign_keys=ON;
  DELETE FROM containers WHERE name="apps4-old";

  <ctrl+d> to exit the interactive DB session
sbworth commented 7 years ago

Remove apps4-old and another stray, both dating back to days after original installation and probably before migration to LVM backing store.

Ran your lxd again and it died in the same way.

DBUG[03-06|17:12:39] Checking LVM storage pool "lxd". 
DBUG[03-06|17:12:39] Checked LVM storage pool "lxd". 
DBUG[03-06|17:12:39] Applying patch: storage_api 
WARN[03-06|17:12:39] Database already contains a valid entry for the storage pool: lxd. 
WARN[03-06|17:12:39] Storage volumes database already contains an entry for the container. 
error: exit status 5

Looking for more strays.

Some of my containers mount data from other LVM volumes not managed by LXD. Could that be a problem?

stgraber commented 7 years ago

No, container devices don't impact the migration process.

sbworth commented 7 years ago

Does the migration occur in alphabetical order?

stgraber commented 7 years ago

Yeah, I believe the containers list we fetch from the database is sorted in alphabetical order.

stgraber commented 7 years ago

I'm looking at the code now to see where else we're spawning a subprocess without better logging things...

brauner commented 7 years ago

This could be lvrename failing because the container has already been succesfully moved. If that's the case I can send a patch.

stgraber commented 7 years ago

@sbworth What does "lvs" show you? I'd expect the containers that show up in /var/lib/lxd/storage-pools/containers to have a "containers" prefix to their name while the others don't. Is your first container renamed and not the others?

sbworth commented 7 years ago

What are we lvrenameing? The two successfully moved containers are now:

lrwxrwxrwx 1 root root 8 Mar  6 16:48 containers_apollo3 -> ../dm-18
lrwxrwxrwx 1 root root 7 Mar  6 16:48 containers_apps4 -> ../dm-8

The others are all untouched.

brauner commented 7 years ago

@sbworth, @stgraber means the output of lvs on your system so for your LXD vg: sudo lvs lxd.

stgraber commented 7 years ago

Ok, so two of them did get renamed, which is causing the issue. Ok, that makes sense. @brauner is working on a fix for that, I'll give you a new binary build once I have it.

sbworth commented 7 years ago

OK. That is just another way of viewing what I showed you:

LV                  VG   Attr        LSize   Pool   Origin   Data%  Meta%
 containers_apollo3  lxd  Vwi-aotz--  10.00g LXDPool  20.69
 containers_apps4    lxd  Vwi-aotz--  10.00g LXDPool  53.16

All the other still have no containers prefix.

brauner commented 7 years ago

I just wanted to make sure that we're seeing the right volume group.

sbworth commented 7 years ago

I have to catch a train soon; so I will be out of contact for at least 80 minutes. (What TZ are you guys in?) Thanks again for your efforts.

stgraber commented 7 years ago

@brauner is in Germany, I'm in Canada (east coast).

brauner commented 7 years ago

https://github.com/lxc/lxd/pull/3027

stgraber commented 7 years ago

https://dl.stgraber.org/lxd-3026 sha256: ab1874f7f33761196969a80c398c2ce542b271499179973770624e038738e37f

stgraber commented 7 years ago

Updated version for that last patch.

https://dl.stgraber.org/lxd-3026 sha256: e22e69ea1dae2f113677b89e13d8f571367ec5fde90675e3b3e00c239d7ec44b

stgraber commented 7 years ago

And another one to include a few extra checks that @brauner added.

https://dl.stgraber.org/lxd-3026 sha256: 41c7c751ce952989144224ffcc855b0f1883ef123727f35518c3620c02fd04af

@sbworth let us know when you're back. We're hoping we can include this fix in LXD 2.11 (to be released tomorrow).

stgraber commented 7 years ago

And again with improved cleanup on failure.

https://dl.stgraber.org/lxd-3026 sha256: d8b098bc0177a8ab10b6c4e86b1fbbec93b43983267f9f314f3153838f7edab7

sbworth commented 7 years ago

Latest trial gets further, but still dies with exit 5. Migrated 4 more containers before dying, but still many more to go. Will continue looking for obvious problem sources.

(Sorry for getting back so late. I have a one year old at home. Plans evolve rapidly.)

brauner commented 7 years ago

@sbworth, if possible it would be great if you could provide the new output after having run the latest binary that @stgraber uploaded. This would involve:

What we'd need now is to find out why there is still an exit status 5.

stgraber commented 7 years ago

I merged @brauner's fixes since those are good to have anyway. I'm preparing a commit which will convert our codebase to actually log the command on failure, a binary with that applied should make failure a bit more understandable.

sbworth commented 7 years ago

Sorry if I was not clear, but I ran the latest stgraber patched lxd. That is what gave me 4 more containers before bailing. I will send along the requested data as soon as I pull it together.

sbworth commented 7 years ago

This is going to take some time. I am fine with posting the bits that seem relevant, but I am very reluctant to post the entire dataset to a public forum without anonymization and I don't have a script ready to do that. Writing such a script now. Will also enable me to quickly generate summary data for any follow-up tests as well.

brauner commented 7 years ago

@sbworth, oh sure. Don't feel pressured to provide this information directly here. If you'd rather provide it in private (anonymized or non-anonymized) to @stgraber and me you can just send a private mail.

stgraber commented 7 years ago

I have a LXD build which should add much of the command logging now, lets see if that tells us what's going on.

stgraber commented 7 years ago

@sbworth https://dl.stgraber.org/lxd-3026 sha256: 648d9999383cd87e5412204b3c0962ddec0984a752d2995f27db06ca4c215c5a

sbworth commented 7 years ago

Is it expected that LXD will migrate the images folder as well? If I am interpreting this correctly, I think I have old pre-LVM images living side-by-side with the symlinks to the logical volumes that should currently represent them.

-rw-r--r-- 1 root root       844 Mar  3 21:37 11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172
lrwxrwxrwx 1 root root        73 Mar  3 21:41 11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172.lv -> /dev/lxd/11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172
-rw-r--r-- 1 root root 153494344 Mar  3 21:40 11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172.rootfs
-rw-r--r-- 1 root root 199032954 Jan 10 10:45 18e7ed74d0d653894f65343afbc35b92c6781933c273943d882c36a5c5535533
lrwxrwxrwx 1 root root        73 Jan 10 10:45 18e7ed74d0d653894f65343afbc35b92c6781933c273943d882c36a5c5535533.lv -> /dev/lxd/18e7ed74d0d653894f65343afbc35b92c6781933c273943d882c36a5c5535533
-rw-r--r-- 1 root root       600 Jan  9 13:47 457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595
lrwxrwxrwx 1 root root        73 Jan  9 13:47 457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595.lv -> /dev/lxd/457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595
-rw-r--r-- 1 root root  98614308 Jan  9 13:47 457a80ea4720900b69e5542cea5351f58021331bc96e773e4855a3e2ce1e6595.rootfs
sbworth commented 7 years ago

Sorry, my interpretation must be wrong. The time stamps all indicate recent access. Apologies for publically arguing with myself.

stgraber commented 7 years ago

Yeah, LXD will migrate images too, but it does that after it's done with containers and snapshots.

When it does, /var/lib/lxd should only contain the image files themselves and the mountpoint for the LVs will be created under /var/lib/lxd/storage-pools/images/ and those symlinks should then disappear.