canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.33k stars 927 forks source link

lxc storage delete deleted wrong storage pool which was in use #5430

Closed infl00p closed 5 years ago

infl00p commented 5 years ago

Required information

Issue description

"lxc storage delete new_btrfs_storage_pool" deleted my dir storage pool (named default) and all my containers! After a lengthy troubleshooting of trying and failing to create and join nodes to a cluster, I moved some temp containers to a new btrfs pool, from my main pool which is dir based. After a lxd restart the daemon could not start due to cluster issues (dqlite could not start?) so I decided to restore the database directory from a copy I took before the troubleshooting. I created the btrfs pool again with the same source, but could not start the containers stored there and could not import them also. So I took a copy of those and deleted the btrfs storage pool with a simple "lxd storage delete lxdpool0" which then totally deleted my default dir storage pool called "default" and all my containers.

Steps to reproduce

As per issue description this requires manual directory moving and database copying and

  1. lxc storage create lxdpool0 btrfs source=/data/lxdpool0
  2. Stop lxd Manual copy of containers and restore of older database
  3. restart lxd
  4. recreate btrfs pool lxdpool0
  5. lxc storage delete lxdpool0 (no force flag !!!)

Information to attach

stgraber commented 5 years ago

Ok, so the bug report is about LXD wiping the directory which was set as the source path for a btrfs storage pool when that storage pool was deleted?

It's normal for LXD to completely wipe clean whatever is in the storage pool's path on deletion as LXD is assuming it's in full control of the storage pool, we have no intention to change that as allowing for non-LXD managed data on there would be a massive pain when we have to relocate data during some upgrades.

The usual way we avoid such issues though is by having LXD refuse to create a storage pool when passed a non-empty path, it could be that the issue here is that this check isn't working on btrfs, or that the database restore somehow let you make it so that LXD didn't know about all the data on the storage pool.

infl00p commented 5 years ago

Yes I agree that I made mistakes and just a check should be made. But the source of the btrfs pool was different than the dir one. The default dir pool is on a ext4 fs. On the lxd.log it's evident that lxd during deletion mistakes lxdpool0 for a dir. This could mean that lxd is not using the same vars for checking and deletion. I will try and provide something reproducible during the weekend.

On Thu, Jan 24, 2019, 00:02 Stéphane Graber <notifications@github.com wrote:

Ok, so the bug report is about LXD wiping the directory which was set as the source path for a btrfs storage pool when that storage pool was deleted?

It's normal for LXD to completely wipe clean whatever is in the storage pool's path on deletion as LXD is assuming it's in full control of the storage pool, we have no intention to change that as allowing for non-LXD managed data on there would be a massive pain when we have to relocate data during some upgrades.

The usual way we avoid such issues though is by having LXD refuse to create a storage pool when passed a non-empty path, it could be that the issue here is that this check isn't working on btrfs, or that the database restore somehow let you make it so that LXD didn't know about all the data on the storage pool.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/5430#issuecomment-456983723, or mute the thread https://github.com/notifications/unsubscribe-auth/ABRF3CTKXaJ4TUHv460Kc_jaPg_s9Jqkks5vGNvbgaJpZM4aPsTo .

stgraber commented 5 years ago

Hmm, yeah, that is pretty odd, it'd have been interesting to see the lxc storage list output prior to that.

The storage backend name is supposed to be unique and its type is stored in the same database row, so either there was some odd with a duplicate record or the like going on here or the database had a weird case of corruption to cause this.

infl00p commented 5 years ago

I'm still trying to isolate the case I deleted my dir pool but here is my case without database modifications. Steps to reproduce:

  1. Create a regular dir pool called default or init with one
  2. Create a btrfs pool with existing btrfs source: lxc storage create lxdpool0 btrfs source=/data/lxdpool0
  3. Copy a container directory, let's call it webapp01, to the lxdpool0 storage-pool dir to /var/lib/lxd/storage-pools/lxdpool0/containers/webapp01
  4. Importing with "lxd import webapp01" fails because backup.yaml contains different pool name, source and driver (default and dir). The error is super weird!
  5. Edit backup.yaml of webapp01 to the proper pool name, source and driver (lxdpool0 and btfs) but not the pool names in disk definitions.
  6. Import succeeds !
  7. Delete lxdpool0
  8. Delete succeeds although there are containers inside !

CONSOLE LOG: root@cloud-two:/var/lib/lxd/storage-pools# mount | grep lxd /dev/mapper/cloud--two-lxdpool0 on /data/lxdpool0 type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/) tmpfs on /var/lib/lxd/shmounts type tmpfs (rw,relatime,size=100k,mode=711) tmpfs on /var/lib/lxd/devlxd type tmpfs (rw,relatime,size=100k,mode=755) /dev/mapper/cloud--two-lxdpool0 on /var/lib/lxd/storage-pools/lxdpool0 type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/) root@cloud-two:/var/lib/lxd/storage-pools# cd lxdpool0/containers root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers# cp -a /opt/storage/backup/lxd/data/backup/containers/webapp01 ./ root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers# cd webapp01/ root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# ls backup.yaml metadata.yaml rootfs rootfs.dev templates
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc list +------+-------+------+------+------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +------+-------+------+------+------+-----------+ root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc storage list +----------+-------------+--------+----------------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +----------+-------------+--------+----------------+---------+ | lxdpool0 | | btrfs | /data/lxdpool0 | 1 | +----------+-------------+--------+----------------+---------+
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxd import webapp01 Error: The storage pool "default" the container was detected on does not match the storage pool "lxdpool0" specified in the backup file root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# #WHAT??? root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc storage create default dir
Storage pool default created root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxd import webapp01 Error: The storage pool "default" the container was detected on does not match the storage pool "lxdpool0" specified in the backup file root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# vi backup.yaml root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxd import webapp01 root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc list +----------+---------+------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +----------+---------+------+------+------------+-----------+ | webapp01 | STOPPED | | | PERSISTENT | | +----------+---------+------+------+------------+-----------+ root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc storage list +----------+-------------+--------+------------------------------------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +----------+-------------+--------+------------------------------------+---------+ | default | | dir | /var/lib/lxd/storage-pools/default | 1 | +----------+-------------+--------+------------------------------------+---------+ | lxdpool0 | | btrfs | /data/lxdpool0 | 1 | +----------+-------------+--------+------------------------------------+---------+ root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# cd .. root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers# cd .. root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0# cd .. root@cloud-two:/var/lib/lxd/storage-pools# lxc storage delete lxdpool0 Storage pool lxdpool0 deleted root@cloud-two:/var/lib/lxd/storage-pools# lxc storage list +---------+-------------+--------+------------------------------------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +---------+-------------+--------+------------------------------------+---------+ | default | | dir | /var/lib/lxd/storage-pools/default | 2 | +---------+-------------+--------+------------------------------------+---------+ root@cloud-two:/var/lib/lxd/storage-pools# # WHAT ? root@cloud-two:/var/lib/lxd/storage-pools# cd default/ root@cloud-two:/var/lib/lxd/storage-pools/default# ls root@cloud-two:/var/lib/lxd/storage-pools/default# # EMPTY ! root@cloud-two:/var/lib/lxd/storage-pools/default# cd .. root@cloud-two:/var/lib/lxd/storage-pools# cd .. root@cloud-two:/var/lib/lxd# cd containers/ root@cloud-two:/var/lib/lxd/containers# ls webapp01 root@cloud-two:/var/lib/lxd/containers# ls -l total 0 lrwxrwxrwx 1 root root 55 Jan 24 23:32 webapp01 -> /var/lib/lxd/storage-pools/lxdpool0/containers/webapp01 root@cloud-two:/var/lib/lxd/containers# lxc list +----------+---------+------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +----------+---------+------+------+------------+-----------+ | webapp01 | STOPPED | | | PERSISTENT | | +----------+---------+------+------+------------+-----------+

stgraber commented 5 years ago

I'm unable to reproduce your issue:

root@vm10:~# lxc storage create default dir
Storage pool default created
root@vm10:~# lxc profile device add default root disk path=/ pool=default
Device root added to default
root@vm10:~# lxc storage list
+---------+-------------+--------+------------------------------------------------+---------+
|  NAME   | DESCRIPTION | DRIVER |                     SOURCE                     | USED BY |
+---------+-------------+--------+------------------------------------------------+---------+
| default |             | dir    | /var/snap/lxd/common/lxd/storage-pools/default | 1       |
+---------+-------------+--------+------------------------------------------------+---------+

root@vm10:~# mkdir -p /data/lxdpool0
root@vm10:~# truncate -s 10GB /root/blah.img
root@vm10:~# mkfs.btrfs /root/blah.img
btrfs-progs v4.15.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               c286731a-80fb-4b8e-9bff-6f2f7215cb1f
Node size:          16384
Sector size:        4096
Filesystem size:    9.31GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP             476.81MiB
  System:           DUP               8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     9.31GiB  /root/blah.img

root@vm10:~# mount -o loop /root/blah.img /data/lxdpool0
root@vm10:~# lxc storage create lxdpool0 btrfs source=/data/lxdpool0
Storage pool lxdpool0 created

root@vm10:~# lxc init images:alpine/edge c1 -s default
Creating c1
root@vm10:~# cp -R /var/snap/lxd/common/lxd/storage-pools/default/containers/c1/ /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/
root@vm10:~# lxc delete -f c1
root@vm10:~# lxd import c1
Error: The storage pool "default" the container was detected on does not match the storage pool "lxdpool0" specified in the backup file

root@vm10:~# cp /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml.orig

root@vm10:~# vim /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml 
root@vm10:~# diff -Nrup /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml.orig /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml
--- /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml.orig    2019-02-07 03:17:41.192341032 +0000
+++ /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml 2019-02-07 03:18:21.476128484 +0000
@@ -15,7 +15,7 @@ container:
   devices:
     root:
       path: /
-      pool: default
+      pool: lxdpool0
       type: disk
   ephemeral: false
   profiles:
@@ -43,7 +43,7 @@ container:
       type: nic
     root:
       path: /
-      pool: default
+      pool: lxdpool0
       type: disk
   name: c1
   status: Stopped
@@ -53,10 +53,10 @@ container:
 snapshots: []
 pool:
   config:
-    source: /var/snap/lxd/common/lxd/storage-pools/default
+    source: /data/lxdpool0
   description: ""
-  name: default
-  driver: dir
+  name: lxdpool0
+  driver: btrfs
   used_by: []
   status: Created
   locations:

root@vm10:~# lxd import c1
root@vm10:~# lxc list
+------+---------+------+------+------------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+------+---------+------+------+------------+-----------+
| c1   | STOPPED |      |      | PERSISTENT |           |
+------+---------+------+------+------------+-----------+
root@vm10:~# lxc storage volume list default
+------+------+-------------+---------+
| TYPE | NAME | DESCRIPTION | USED BY |
+------+------+-------------+---------+
root@vm10:~# lxc storage volume list lxdpool0
+-----------+------+-------------+---------+
|   TYPE    | NAME | DESCRIPTION | USED BY |
+-----------+------+-------------+---------+
| container | c1   |             | 1       |
+-----------+------+-------------+---------+

root@vm10:~# lxc storage delete lxdpool0
Error: storage pool "lxdpool0" has volumes attached to it
stgraber commented 5 years ago

@infl00p any idea what I'm missing above?

stgraber commented 5 years ago

@infl00p