Closed infl00p closed 5 years ago
Ok, so the bug report is about LXD wiping the directory which was set as the source path for a btrfs storage pool when that storage pool was deleted?
It's normal for LXD to completely wipe clean whatever is in the storage pool's path on deletion as LXD is assuming it's in full control of the storage pool, we have no intention to change that as allowing for non-LXD managed data on there would be a massive pain when we have to relocate data during some upgrades.
The usual way we avoid such issues though is by having LXD refuse to create a storage pool when passed a non-empty path, it could be that the issue here is that this check isn't working on btrfs, or that the database restore somehow let you make it so that LXD didn't know about all the data on the storage pool.
Yes I agree that I made mistakes and just a check should be made. But the source of the btrfs pool was different than the dir one. The default dir pool is on a ext4 fs. On the lxd.log it's evident that lxd during deletion mistakes lxdpool0 for a dir. This could mean that lxd is not using the same vars for checking and deletion. I will try and provide something reproducible during the weekend.
On Thu, Jan 24, 2019, 00:02 Stéphane Graber <notifications@github.com wrote:
Ok, so the bug report is about LXD wiping the directory which was set as the source path for a btrfs storage pool when that storage pool was deleted?
It's normal for LXD to completely wipe clean whatever is in the storage pool's path on deletion as LXD is assuming it's in full control of the storage pool, we have no intention to change that as allowing for non-LXD managed data on there would be a massive pain when we have to relocate data during some upgrades.
The usual way we avoid such issues though is by having LXD refuse to create a storage pool when passed a non-empty path, it could be that the issue here is that this check isn't working on btrfs, or that the database restore somehow let you make it so that LXD didn't know about all the data on the storage pool.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/5430#issuecomment-456983723, or mute the thread https://github.com/notifications/unsubscribe-auth/ABRF3CTKXaJ4TUHv460Kc_jaPg_s9Jqkks5vGNvbgaJpZM4aPsTo .
Hmm, yeah, that is pretty odd, it'd have been interesting to see the lxc storage list
output prior to that.
The storage backend name is supposed to be unique and its type is stored in the same database row, so either there was some odd with a duplicate record or the like going on here or the database had a weird case of corruption to cause this.
I'm still trying to isolate the case I deleted my dir pool but here is my case without database modifications. Steps to reproduce:
CONSOLE LOG:
root@cloud-two:/var/lib/lxd/storage-pools# mount | grep lxd
/dev/mapper/cloud--two-lxdpool0 on /data/lxdpool0 type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)
tmpfs on /var/lib/lxd/shmounts type tmpfs (rw,relatime,size=100k,mode=711)
tmpfs on /var/lib/lxd/devlxd type tmpfs (rw,relatime,size=100k,mode=755)
/dev/mapper/cloud--two-lxdpool0 on /var/lib/lxd/storage-pools/lxdpool0 type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)
root@cloud-two:/var/lib/lxd/storage-pools# cd lxdpool0/containers
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers# cp -a /opt/storage/backup/lxd/data/backup/containers/webapp01 ./
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers# cd webapp01/
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# ls
backup.yaml metadata.yaml rootfs rootfs.dev templates
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc list
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc storage list
+----------+-------------+--------+----------------+---------+
| NAME | DESCRIPTION | DRIVER | SOURCE | USED BY |
+----------+-------------+--------+----------------+---------+
| lxdpool0 | | btrfs | /data/lxdpool0 | 1 |
+----------+-------------+--------+----------------+---------+
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxd import webapp01
Error: The storage pool "default" the container was detected on does not match the storage pool "lxdpool0" specified in the backup file
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# #WHAT???
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc storage create default dir
Storage pool default created
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxd import webapp01
Error: The storage pool "default" the container was detected on does not match the storage pool "lxdpool0" specified in the backup file
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# vi backup.yaml
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxd import webapp01
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc list
+----------+---------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------+---------+------+------+------------+-----------+
| webapp01 | STOPPED | | | PERSISTENT | |
+----------+---------+------+------+------------+-----------+
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# lxc storage list
+----------+-------------+--------+------------------------------------+---------+
| NAME | DESCRIPTION | DRIVER | SOURCE | USED BY |
+----------+-------------+--------+------------------------------------+---------+
| default | | dir | /var/lib/lxd/storage-pools/default | 1 |
+----------+-------------+--------+------------------------------------+---------+
| lxdpool0 | | btrfs | /data/lxdpool0 | 1 |
+----------+-------------+--------+------------------------------------+---------+
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers/webapp01# cd ..
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0/containers# cd ..
root@cloud-two:/var/lib/lxd/storage-pools/lxdpool0# cd ..
root@cloud-two:/var/lib/lxd/storage-pools# lxc storage delete lxdpool0
Storage pool lxdpool0 deleted
root@cloud-two:/var/lib/lxd/storage-pools# lxc storage list
+---------+-------------+--------+------------------------------------+---------+
| NAME | DESCRIPTION | DRIVER | SOURCE | USED BY |
+---------+-------------+--------+------------------------------------+---------+
| default | | dir | /var/lib/lxd/storage-pools/default | 2 |
+---------+-------------+--------+------------------------------------+---------+
root@cloud-two:/var/lib/lxd/storage-pools# # WHAT ?
root@cloud-two:/var/lib/lxd/storage-pools# cd default/
root@cloud-two:/var/lib/lxd/storage-pools/default# ls
root@cloud-two:/var/lib/lxd/storage-pools/default# # EMPTY !
root@cloud-two:/var/lib/lxd/storage-pools/default# cd ..
root@cloud-two:/var/lib/lxd/storage-pools# cd ..
root@cloud-two:/var/lib/lxd# cd containers/
root@cloud-two:/var/lib/lxd/containers# ls
webapp01
root@cloud-two:/var/lib/lxd/containers# ls -l
total 0
lrwxrwxrwx 1 root root 55 Jan 24 23:32 webapp01 -> /var/lib/lxd/storage-pools/lxdpool0/containers/webapp01
root@cloud-two:/var/lib/lxd/containers# lxc list
+----------+---------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------+---------+------+------+------------+-----------+
| webapp01 | STOPPED | | | PERSISTENT | |
+----------+---------+------+------+------------+-----------+
I'm unable to reproduce your issue:
root@vm10:~# lxc storage create default dir
Storage pool default created
root@vm10:~# lxc profile device add default root disk path=/ pool=default
Device root added to default
root@vm10:~# lxc storage list
+---------+-------------+--------+------------------------------------------------+---------+
| NAME | DESCRIPTION | DRIVER | SOURCE | USED BY |
+---------+-------------+--------+------------------------------------------------+---------+
| default | | dir | /var/snap/lxd/common/lxd/storage-pools/default | 1 |
+---------+-------------+--------+------------------------------------------------+---------+
root@vm10:~# mkdir -p /data/lxdpool0
root@vm10:~# truncate -s 10GB /root/blah.img
root@vm10:~# mkfs.btrfs /root/blah.img
btrfs-progs v4.15.1
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: c286731a-80fb-4b8e-9bff-6f2f7215cb1f
Node size: 16384
Sector size: 4096
Filesystem size: 9.31GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 476.81MiB
System: DUP 8.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 9.31GiB /root/blah.img
root@vm10:~# mount -o loop /root/blah.img /data/lxdpool0
root@vm10:~# lxc storage create lxdpool0 btrfs source=/data/lxdpool0
Storage pool lxdpool0 created
root@vm10:~# lxc init images:alpine/edge c1 -s default
Creating c1
root@vm10:~# cp -R /var/snap/lxd/common/lxd/storage-pools/default/containers/c1/ /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/
root@vm10:~# lxc delete -f c1
root@vm10:~# lxd import c1
Error: The storage pool "default" the container was detected on does not match the storage pool "lxdpool0" specified in the backup file
root@vm10:~# cp /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml.orig
root@vm10:~# vim /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml
root@vm10:~# diff -Nrup /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml.orig /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml
--- /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml.orig 2019-02-07 03:17:41.192341032 +0000
+++ /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/lxdpool0/containers/c1/backup.yaml 2019-02-07 03:18:21.476128484 +0000
@@ -15,7 +15,7 @@ container:
devices:
root:
path: /
- pool: default
+ pool: lxdpool0
type: disk
ephemeral: false
profiles:
@@ -43,7 +43,7 @@ container:
type: nic
root:
path: /
- pool: default
+ pool: lxdpool0
type: disk
name: c1
status: Stopped
@@ -53,10 +53,10 @@ container:
snapshots: []
pool:
config:
- source: /var/snap/lxd/common/lxd/storage-pools/default
+ source: /data/lxdpool0
description: ""
- name: default
- driver: dir
+ name: lxdpool0
+ driver: btrfs
used_by: []
status: Created
locations:
root@vm10:~# lxd import c1
root@vm10:~# lxc list
+------+---------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+------+------+------------+-----------+
| c1 | STOPPED | | | PERSISTENT | |
+------+---------+------+------+------------+-----------+
root@vm10:~# lxc storage volume list default
+------+------+-------------+---------+
| TYPE | NAME | DESCRIPTION | USED BY |
+------+------+-------------+---------+
root@vm10:~# lxc storage volume list lxdpool0
+-----------+------+-------------+---------+
| TYPE | NAME | DESCRIPTION | USED BY |
+-----------+------+-------------+---------+
| container | c1 | | 1 |
+-----------+------+-------------+---------+
root@vm10:~# lxc storage delete lxdpool0
Error: storage pool "lxdpool0" has volumes attached to it
@infl00p any idea what I'm missing above?
@infl00p
Required information
The output of "lxc info" or if that fails: config: cluster.https_address: cloud-three.zen:8443 core.https_address: cloud-three.zen:8443 core.trust_password: true api_extensions:
storage_zfs_remove_snapshots
container_host_shutdown_timeout
container_stop_priority
container_syscall_filtering
auth_pki
container_last_used_at
etag
patch
usb_devices
https_allowed_credentials
image_compression_algorithm
directory_manipulation
container_cpu_time
storage_zfs_use_refquota
storage_lvm_mount_options
network
profile_usedby
container_push
container_exec_recording
certificate_update
container_exec_signal_handling
gpu_devices
container_image_properties
migration_progress
id_map
network_firewall_filtering
network_routes
storage
file_delete
file_append
network_dhcp_expiry
storage_lvm_vg_rename
storage_lvm_thinpool_rename
network_vlan
image_create_aliases
container_stateless_copy
container_only_migration
storage_zfs_clone_copy
unix_device_rename
storage_lvm_use_thinpool
storage_rsync_bwlimit
network_vxlan_interface
storage_btrfs_mount_options
entity_description
image_force_refresh
storage_lvm_lv_resizing
id_map_base
file_symlinks
container_push_target
network_vlan_physical
storage_images_delete
container_edit_metadata
container_snapshot_stateful_migration
storage_driver_ceph
storage_ceph_user_name
resource_limits
storage_volatile_initial_source
storage_ceph_force_osd_reuse
storage_block_filesystem_btrfs
resources
kernel_limits
storage_api_volume_rename
macaroon_authentication
network_sriov
console
restrict_devlxd
migration_pre_copy
infiniband
maas_network
devlxd_events
proxy
network_dhcp_gateway
file_get_symlink
network_leases
unix_device_hotplug
storage_api_local_volume_handling
operation_description
clustering
event_lifecycle
storage_api_remote_volume_handling
nvidia_runtime
container_mount_propagation
container_backup
devlxd_images
container_local_cross_pool_handling
proxy_unix
proxy_udp
clustering_join
proxy_tcp_udp_multi_port_handling
network_state
proxy_unix_dac_properties
container_protection_delete
unix_priv_drop
pprof_http
proxy_haproxy_protocol
network_hwaddr
proxy_nat
network_nat_order
container_full
candid_authentication
backup_compression
candid_config
nvidia_runtime_config
storage_api_volume_snapshots
storage_unmapped
projects
candid_config_key
network_vxlan_ttl
container_incremental_copy
usb_optional_vendorid
snapshot_scheduling
container_copy_project
clustering_server_address
clustering_image_replication
container_protection_shift api_status: stable api_version: "1.0" auth: trusted public: false auth_methods:
tls environment: addresses:
cloud-three.zen:8443 architectures:
x86_64
i686 certificate: | REDACTED certificate_fingerprint: c368081ba000e22dcc1f7834eebd0b7974abd2f4e4af7af220525b626ca06e04 driver: lxc driver_version: 3.1.0 (devel) kernel: Linux kernel_architecture: x86_64 kernel_version: 4.19.0-0.bpo.1-amd64 server: lxd server_pid: 15231 server_version: "3.9" storage: dir | btrfs storage_version: 1 | 4.7.3 server_clustered: true server_name: cloud-three project: default
Kernel version: 4.19.0-0.bpo.1-amd64
LXC version: 3.9
LXD version: 3.9
Storage backend in use: dir, btrfs
Issue description
"lxc storage delete new_btrfs_storage_pool" deleted my dir storage pool (named default) and all my containers! After a lengthy troubleshooting of trying and failing to create and join nodes to a cluster, I moved some temp containers to a new btrfs pool, from my main pool which is dir based. After a lxd restart the daemon could not start due to cluster issues (dqlite could not start?) so I decided to restore the database directory from a copy I took before the troubleshooting. I created the btrfs pool again with the same source, but could not start the containers stored there and could not import them also. So I took a copy of those and deleted the btrfs storage pool with a simple "lxd storage delete lxdpool0" which then totally deleted my default dir storage pool called "default" and all my containers.
Steps to reproduce
As per issue description this requires manual directory moving and database copying and
Information to attach
[ ] Any relevant kernel output (
dmesg
)[ ] Container log (
lxc info NAME --show-log
)[ ] Container configuration (
lxc config show NAME --expanded
)[ X] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log) t=2019-01-18T00:21:34+0200 lvl=info msg="Received 'power failure signal', shutting down containers" t=2019-01-18T00:21:35+0200 lvl=warn msg="Unable to update backup.yaml at this time" name=webapp01 rootfs=/var/lib/lxd/containers/webapp01/rootfs t=2019-01-18T00:21:35+0200 lvl=warn msg="Unable to update backup.yaml at this time" name=webapp02 rootfs=/var/lib/lxd/containers/webapp02/rootfs t=2019-01-18T00:21:35+0200 lvl=info msg="Starting shutdown sequence" t=2019-01-18T00:21:35+0200 lvl=info msg="Stopping REST API handler:" t=2019-01-18T00:21:35+0200 lvl=info msg=" - closing socket" socket=192.168.10.21:8443 t=2019-01-18T00:21:35+0200 lvl=info msg=" - closing socket" socket=/var/lib/lxd/unix.socket t=2019-01-18T00:21:35+0200 lvl=info msg="Stopping /dev/lxd handler:" t=2019-01-18T00:21:35+0200 lvl=info msg=" - closing socket" socket=/var/lib/lxd/devlxd/sock t=2019-01-18T00:21:35+0200 lvl=info msg="Closing the database" t=2019-01-18T00:21:35+0200 lvl=eror msg="Dqlite: aborting (fd=36 state=header msg=(null))" t=2019-01-18T00:21:35+0200 lvl=info msg="Unmounting temporary filesystems" t=2019-01-18T00:21:35+0200 lvl=info msg="Done unmounting temporary filesystems" t=2019-01-18T00:21:35+0200 lvl=info msg="Saving simplestreams cache" t=2019-01-18T00:21:35+0200 lvl=info msg="Saved simplestreams cache" t=2019-01-18T00:21:41+0200 lvl=info msg="LXD 3.9 is starting in normal mode" path=/var/lib/lxd t=2019-01-18T00:21:42+0200 lvl=info msg="Kernel uid/gid map:" t=2019-01-18T00:21:42+0200 lvl=info msg=" - u 0 0 4294967295" t=2019-01-18T00:21:42+0200 lvl=info msg=" - g 0 0 4294967295" t=2019-01-18T00:21:42+0200 lvl=info msg="Configured LXD uid/gid map:" t=2019-01-18T00:21:42+0200 lvl=info msg=" - u 0 100000 65536" t=2019-01-18T00:21:42+0200 lvl=info msg=" - g 0 100000 65536" t=2019-01-18T00:21:42+0200 lvl=info msg="Kernel features:" t=2019-01-18T00:21:42+0200 lvl=info msg=" - netnsid-based network retrieval: no" t=2019-01-18T00:21:42+0200 lvl=info msg=" - uevent injection: yes" t=2019-01-18T00:21:42+0200 lvl=info msg=" - unprivileged file capabilities: yes" t=2019-01-18T00:21:42+0200 lvl=info msg="Initializing local database" t=2019-01-18T00:21:42+0200 lvl=info msg="Starting /dev/lxd handler:" t=2019-01-18T00:21:42+0200 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock t=2019-01-18T00:21:42+0200 lvl=info msg="REST API daemon:" t=2019-01-18T00:21:42+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/lib/lxd/unix.socket t=2019-01-18T00:21:42+0200 lvl=info msg=" - binding TCP socket" socket=192.168.10.21:8443 t=2019-01-18T00:21:42+0200 lvl=info msg="Initializing global database" t=2019-01-18T00:21:47+0200 lvl=warn msg="Raft: Heartbeat timeout from \"\" reached, starting election" t=2019-01-18T00:21:48+0200 lvl=info msg="Initializing storage pools" t=2019-01-18T00:21:48+0200 lvl=info msg="Initializing networks" t=2019-01-18T00:21:48+0200 lvl=info msg="Pruning leftover image files" t=2019-01-18T00:21:48+0200 lvl=info msg="Done pruning leftover image files" t=2019-01-18T00:21:48+0200 lvl=info msg="Loading daemon configuration" t=2019-01-18T00:21:48+0200 lvl=info msg="Pruning expired images" t=2019-01-18T00:21:48+0200 lvl=info msg="Done pruning expired images" t=2019-01-18T00:21:48+0200 lvl=info msg="Pruning expired container backups" t=2019-01-18T00:21:48+0200 lvl=info msg="Done pruning expired container backups" t=2019-01-18T00:21:49+0200 lvl=info msg="Expiring log files" t=2019-01-18T00:21:49+0200 lvl=info msg="Done expiring log files" t=2019-01-18T00:21:49+0200 lvl=info msg="Updating images" t=2019-01-18T00:21:49+0200 lvl=info msg="Done updating images" t=2019-01-18T00:21:49+0200 lvl=info msg="Updating instance types" t=2019-01-18T00:21:49+0200 lvl=info msg="Done updating instance types" t=2019-01-18T00:22:25+0200 lvl=info msg="Deleting BTRFS storage pool \"lxdpool0\"" t=2019-01-18T00:22:25+0200 lvl=info msg="Deleted BTRFS storage pool \"lxdpool0\"" t=2019-01-18T00:23:05+0200 lvl=info msg="Creating BTRFS storage pool \"lxdpool0\"" t=2019-01-18T00:26:03+0200 lvl=info msg="Creating BTRFS storage pool \"lxdpool0\"" t=2019-01-18T00:33:35+0200 lvl=info msg="Deleting container" created=1970-01-01T02:00:00+0200 ephemeral=false name=pqdb02 project=default used=2018-11-16T14:17:11+0200 t=2019-01-18T00:33:35+0200 lvl=info msg="Deleted container" created=1970-01-01T02:00:00+0200 ephemeral=false name=pqdb02 project=default used=2018-11-16T14:17:11+0200 t=2019-01-18T00:34:19+0200 lvl=info msg="Deleting DIR storage pool \"lxdpool0\""
[ ] Output of the client with --debug
[ ] Output of the daemon with --debug (alternatively output of
lxc monitor
while reproducing the issue)