canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.37k stars 930 forks source link

`lxd recover` loses the expiration dates of snapshots. #13464

Closed simondeziel closed 2 months ago

simondeziel commented 5 months ago

I just went through a successful lxd recover which reimport my container and its snapshots from the intact zpool. Snapshots are taken on a schedule:

# lxc config show -e ganymede | grep snapshot
  snapshots.expiry: 3d
  snapshots.schedule: '@daily, @startup'

However, after lxd recover brought those snapshots back, they lost their expires at field:

# lxc info ganymede | sed -n '/^Snapshots:$/,$ p'
Snapshots:
+---------+----------------------+----------------------+----------+
|  NAME   |       TAKEN AT       |      EXPIRES AT      | STATEFUL |
+---------+----------------------+----------------------+----------+
| snap222 | 2024/05/07 05:28 UTC |                      | NO       |
+---------+----------------------+----------------------+----------+
| snap223 | 2024/05/08 05:28 UTC |                      | NO       |
+---------+----------------------+----------------------+----------+
| snap224 | 2024/05/09 05:28 UTC |                      | NO       |
+---------+----------------------+----------------------+----------+
| snap225 | 2024/05/09 22:21 UTC | 2024/05/12 22:21 UTC | NO       |
+---------+----------------------+----------------------+----------+

In the above, snap225 was taken after the lxd recover.

However, the instance's backup.yaml should have had this information which is where it learned about the taken at field. That said, it seems the recovery have overwritten the backup.yaml with the bogus values now:

# LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ nsenter --mount=/run/snapd/ns/lxd.mnt sed -n '/^volume_snapshots:$/,$ p' /var/snap/lxd/common/lxd/storage-pools/default/containers/ganymede/backup.yaml
volume_snapshots:
- name: snap222
  description: ""
  content_type: filesystem
  created_at: 0001-01-01T00:00:00Z
  expires_at: 0001-01-01T00:00:00Z
  config:
    volatile.uuid: a951c197-c11f-4fcc-a76e-ec575b99e305
- name: snap223
  description: ""
  content_type: filesystem
  created_at: 0001-01-01T00:00:00Z
  expires_at: 0001-01-01T00:00:00Z
  config:
    volatile.uuid: 2b8f7c0c-92da-4170-af35-c5033ec6b89c
- name: snap224
  description: ""
  content_type: filesystem
  created_at: 0001-01-01T00:00:00Z
  expires_at: 0001-01-01T00:00:00Z
  config:
    volatile.uuid: dba273a7-2980-46e4-afc3-ddb7ec617171
- name: snap225
  description: ""
  content_type: filesystem
  created_at: 2024-05-09T22:21:56.762923236Z
  expires_at: 0001-01-01T00:00:00Z
  config:
    volatile.uuid: 93c48d05-4de3-492b-8e6c-f3ce2e3e9c63

Additional information:

# snap list lxd
Name  Version         Rev    Tracking     Publisher   Notes
lxd   5.21.1-d46c406  28460  5.21/stable  canonical✓  -
simondeziel commented 5 months ago

In fact, looking at the snap225 section of the backup.yaml (added post-recovery), it seems that during normal operations LXD doesn't save the right expires_at field in the backup.yaml file.

simondeziel commented 2 months ago

Here's a easy reproducer for the previous comment where I said the backup.yaml didn't contain the expires_at field:

$ lxc launch images:alpine/edge c1 -c snapshots.expiry=1d
$ lxc snapshot c1
$ sudo LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ nsenter --mount=/run/snapd/ns/lxd.mnt sed -n '/^volume_snapshots:$/,$ p' /var/snap/lxd/common/lxd/storage-pools/default/containers/c1/backup.yaml
volume_snapshots:
- name: snap0
  description: ""
  content_type: filesystem
  created_at: 2024-08-23T20:38:04.166424212Z
  expires_at: 0001-01-01T00:00:00Z
  config:
    volatile.uuid: 80b83e4b-482b-49a4-b83e-4e965ce51265

While clearly LXD itself is aware of the snapshot expiry:

$ lxc info c1 | sed -n '/^Snapshots:/,$ p'
Snapshots:
+-------+----------------------+----------------------+----------+
| NAME  |       TAKEN AT       |      EXPIRES AT      | STATEFUL |
+-------+----------------------+----------------------+----------+
| snap0 | 2024/08/23 16:38 EDT | 2024/08/24 16:38 EDT | NO       |
+-------+----------------------+----------------------+----------+
kadinsayani commented 2 months ago

It appears that the snapshot expiry date is correct, hence why LXD is aware of the snapshot (see below snippet). However, lxd recover is using the volume_snapshots expiry which is zeroed. Furthermore, volume snapshot expiry is set differently than instance snapshot expiry (lxc storage volume set default container/c1 snapshots.expiry=1d). I'm not sure what the intended behaviour is for volume snapshots expiry dates, ie. should they match up with instance snapshot expiry dates? Or, should lxd recover be looking at the instance snapshot expiry date rather than the volume snapshot expiry date? cc. @tomponline

$ sudo LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ nsenter --mount=/run/snapd/ns/lxd.mnt sed -n '/^snapshots:$/,$ p' /var/snap/lxd/common/lxd/storage-pools/default/containers/c1/backup.yaml
snapshots:
- architecture: x86_64
  config:
    image.architecture: amd64
    image.description: Alpine edge amd64 (20240823_0018)
    image.os: Alpine
    image.release: edge
    image.requirements.secureboot: "false"
    image.serial: "20240823_0018"
    image.type: squashfs
    image.variant: default
    snapshots.expiry: 1d
    volatile.base_image: 3aab2d4b12a5bf88b798fe02cf361349cb9cd5648c89789bfac96f1cdce1d32c
    volatile.cloud-init.instance-id: 5dd2a543-b181-4dce-8e0c-4202173421b7
    volatile.eth0.host_name: vethdc246f41
    volatile.eth0.hwaddr: 00:16:3e:9f:79:4b
    volatile.idmap.base: "0"
    volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
    volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
    volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
    volatile.last_state.power: RUNNING
    volatile.uuid: ad4f73ac-50e7-4e92-87a1-82a05e928157
    volatile.uuid.generation: ad4f73ac-50e7-4e92-87a1-82a05e928157
  created_at: 2024-08-23T22:01:05.356755362Z
  expires_at: 2024-08-24T22:01:05.353609739Z
  devices: {}
  ephemeral: false
  expanded_config:
    image.architecture: amd64
    image.description: Alpine edge amd64 (20240823_0018)
    image.os: Alpine
    image.release: edge
    image.requirements.secureboot: "false"
    image.serial: "20240823_0018"
    image.type: squashfs
    image.variant: default
    snapshots.expiry: 1d
    volatile.base_image: 3aab2d4b12a5bf88b798fe02cf361349cb9cd5648c89789bfac96f1cdce1d32c
    volatile.cloud-init.instance-id: 5dd2a543-b181-4dce-8e0c-4202173421b7
    volatile.eth0.host_name: vethdc246f41
    volatile.eth0.hwaddr: 00:16:3e:9f:79:4b
    volatile.idmap.base: "0"
    volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
    volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
    volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
    volatile.last_state.power: RUNNING
    volatile.uuid: ad4f73ac-50e7-4e92-87a1-82a05e928157
    volatile.uuid.generation: ad4f73ac-50e7-4e92-87a1-82a05e928157
  expanded_devices:
    eth0:
      name: eth0
      network: lxdbr1
      type: nic
    root:
      path: /
      pool: default
      type: disk
  last_used_at: 0001-01-01T00:00:00Z
  name: snap0
  profiles:
  - default
  stateful: false
  size: -1
pool:
  name: default
  description: ""
  driver: zfs
  status: Created
  config:
    size: 9GiB
    source: /var/snap/lxd/common/lxd/disks/default.img
    zfs.pool_name: default
  used_by: []
  locations:
  - none
profiles:
- name: default
  description: Default LXD profile
  config: {}
  devices:
    eth0:
      name: eth0
      network: lxdbr1
      type: nic
    root:
      path: /
      pool: default
      type: disk
  used_by: []
volume:
  name: c1
  description: ""
  type: container
  pool: default
  content_type: filesystem
  project: default
  location: none
  created_at: 2024-08-23T22:01:04.354122974Z
  config:
    volatile.uuid: 50aa66e6-dca7-4676-94e5-ee292e098d7f
  used_by: []
volume_snapshots:
- name: snap0
  description: ""
  content_type: filesystem
  created_at: 2024-08-23T22:01:05.356755362Z
  expires_at: 0001-01-01T00:00:00Z
  config:
    volatile.uuid: 69e6db4c-f32c-49ff-b79b-08f388a8fe9c
simondeziel commented 2 months ago

As found by @kadinsayani, the volume snapshot associated with snapshot'ing an instance has no expiry set:

$ lxc launch images:alpine/edge c1 -c snapshots.expiry=1d
$ lxc snapshot c1
$ lxc storage volume show default container/c1/snap0
name: snap0
description: ""
content_type: filesystem
created_at: 2024-08-26T13:36:28.408175831Z
expires_at: 0001-01-01T00:00:00Z
config:
  volatile.uuid: 705af216-a8cb-4494-b8ca-dda67a8d1dd2

# or more simply
$ lxc storage volume get default container/c1/snap0 --property expires_at
0001-01-01 00:00:00 +0000 UTC

However the instance's snapshot has one:

$ lxc info c1 | sed -n '/^Snapshots:/,$ p'
Snapshots:
+-------+----------------------+----------------------+----------+
| NAME  |       TAKEN AT       |      EXPIRES AT      | STATEFUL |
+-------+----------------------+----------------------+----------+
| snap0 | 2024/08/26 09:36 EDT | 2024/08/27 09:36 EDT | NO       |
+-------+----------------------+----------------------+----------+

Could it be due to how snapshots are cleaned up? Maybe instance snapshots are cleaned in a different pass than volume ones?