canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

Error: Create instance snapshot: Invalid option for volume "<VM>/<snapshot>" option "block.filesystem" LXD 5.18 #12325

Closed tomposmiko closed 11 months ago

tomposmiko commented 12 months ago

Required information

Issue description

This change might be related, it's suspicious: https://github.com/canonical/lxd/pull/12203

If I want to create a snapshot of a VM backed by ZFS, I get the following error message:

# lxc snapshot narwhalci man_2023-09-26_before-update
Error: Create instance snapshot: Invalid option for volume "narwhalci/man_2023-09-26_before-update" option "block.filesystem"

Snapshot can be created of a container and it fails to be created for other VMs too.

Steps to reproduce

Create a snapshot of a ZFS-backed VM.

tomponline commented 12 months ago

Mmm yeah looks like a regression.

Please can you show "lxc storage show (pool)" and "lxc config show (instance)"

monstermunchkin commented 11 months ago

I can confirm this. If the volume has the block.filesystem config key set, snapshots aren't possible.

Here's a reproducer:

$ lxc init ubuntu:focal v1 --vm -s zfs
Creating v1
$ lxd sql global 'INSERT INTO storage_volumes_config (storage_volume_id, key, value) VALUES (<volume-id>, "block.filesystem", "ext4")'
Rows affected: 1
$ lxc snapshot v1
Error: Create instance snapshot: Invalid option for volume "v1/snap0" option "block.filesystem"
monstermunchkin commented 11 months ago

This is also a bug for storage volume snapshots.

tomponline commented 11 months ago

@monstermunchkin looks like we are missing tests for snapshots of block backed filesystem volumes?

monstermunchkin commented 11 months ago

@monstermunchkin looks like we are missing tests for snapshots of block backed filesystem volumes?

I don't think so. The issue is that old VMs (or custom volumes) still have block.filesystem set. New VMs (or custom volumes) wouldn't have that config key. We can fix that by setting removeUnknownKeys to true when creating database entries for snapshots.

tomponline commented 11 months ago

@monstermunchkin I think we need a patch to fix the incorrect keys in the existing parent volumes, as If I'm understanding correctly you're saying those volumes have incorrect keys present?

tomponline commented 11 months ago

Is this only for ZFS with block mode enabled btw?

monstermunchkin commented 11 months ago

you're saying those volumes have incorrect keys present?

That's what I believe is the issue as that's how I was able to reproduce it. Waiting for OP to confirm.

Is this only for ZFS with block mode enabled btw?

I can reproduce the issue with block mode disabled.

tomponline commented 11 months ago

OK thanks, if you can reproduce the issue please can you go ahead and create the patch.

tomposmiko commented 11 months ago

hi,

I cannot reproduce it with a normal zfs dataset (container), only with a block device (VM). Eg.:

tank/lxd/virtual-machines/narwhalci                                                             7.53M  92.5M     7.50M  legacy
tank/lxd/virtual-machines/narwhalci.block                                                       8.44G   210G     7.47G  -
tomponline commented 11 months ago

@tomposmiko thanks, and does it work OK for newly created VMs?

tomposmiko commented 11 months ago

The issue is that old VMs What do you mean by old and new VMs?

tomponline commented 11 months ago

If you create a new VM on ZFS pool (since upgrading to LXD 5.18) and take a snapshot does it work?

tomposmiko commented 11 months ago
# lxc storage show default
config: {}
description: ""
name: default
driver: zfs
used_by:
- /1.0/instances/db1
- /1.0/instances/db2
- /1.0/instances/gw1
- /1.0/instances/gw1-vm
- /1.0/instances/ig11
- /1.0/instances/ig11/snapshots/test
- /1.0/instances/jenkins
- /1.0/instances/jenkins/snapshots/man_2023-09-13_before-update
- /1.0/instances/jenkins/snapshots/man_2023-09_06_before_upgrade
- /1.0/instances/jumper
- /1.0/instances/jumper/snapshots/man-2023-05-11-teleport_ok
- /1.0/instances/narwhalci
- /1.0/instances/narwhalci/snapshots/man_2023-08-29_jenkins-v2.401.3
- /1.0/instances/observer
- /1.0/instances/observer/snapshots/before_teleport
- /1.0/instances/puppet
- /1.0/instances/search1
- /1.0/instances/search2
- /1.0/instances/search3
- /1.0/instances/web1
- /1.0/instances/web2
- /1.0/profiles/default
status: Created
locations:
- bob7
# lxc config show narwhalci 
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20230622_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20230622_07:42"
  image.type: disk-kvm.img
  image.variant: default
  limits.cpu: "4"
  limits.memory: 4GiB
  volatile.base_image: ed5a5f5014caa1e9e8436b506c21e7ea8d4dd9c4ae30ea56cea2547e5607d274
  volatile.cloud-init.instance-id: cf0f1605-e15b-4633-bff6-65ae8ccecf5a
  volatile.eth0.host_name: tap7b7ddca0
  volatile.eth0.hwaddr: 00:16:3e:e6:ca:e3
  volatile.last_state.power: RUNNING
  volatile.uuid: 9dd80737-a82e-43c3-8eb1-96aedb11cd24
  volatile.uuid.generation: 9dd80737-a82e-43c3-8eb1-96aedb11cd24
  volatile.vsock_id: "30"
devices:
  root:
    path: /
    pool: default
    size: 40GiB
    type: disk
ephemeral: false
profiles:
- default
- eth0_intra
- limits
stateful: false
description: ""
tomponline commented 11 months ago

Please could you show lxc storage volume show default virtual-machine/narwhalci

tomposmiko commented 11 months ago

I cannot reproduce it with a newly created VM.

tomposmiko commented 11 months ago

Thanks for the prompt response. Amazing 😎!

tomponline commented 11 months ago

We will get this into latest/stable in the next day or so.

ckruijntjens commented 11 months ago

Hi I got the same issue.

Is this allredeay pushed? if i do a snap refresh it tels me its up to date? issue is still there for me.

tomponline commented 11 months ago

Its in latest/candidate channel you can try.

I'll push it to latest/stable early next week.

ckruijntjens commented 11 months ago

How change to latest/candidate channel?

ckruijntjens commented 11 months ago

a otherwise i will wait a week its no problem.

tomponline commented 11 months ago

sudo snap refresh lxd --channel=latest/candidate

ckruijntjens commented 11 months ago

hi,

i yust did a snap refresh.

server_version: "5.18"

But i still got the same error when doing a snapshot.

tomponline commented 11 months ago

Please can you show output of:

lxc storage volume show <pool> virtual-machine/<instance_name>
ckruijntjens commented 11 months ago

when i do this

root@esx:~# lxc storage volume show default collabora

i get this: Error: Storage pool volume not found

i have no snapshots for this vm because i can not create them.

tomponline commented 11 months ago

Please run:

lxc storage volume show default virtual-machine/collabora
ckruijntjens commented 11 months ago

root@esx:~# lxc storage volume show default virtual-machine/collabora config: block.filesystem: ext4 block.mount_options: discard description: "" name: collabora type: virtual-machine used_by:

tomponline commented 11 months ago

Please show sudo snap info lxd

ckruijntjens commented 11 months ago

root@esx:~# sudo snap info lxd name: lxd summary: LXD - container and VM manager publisher: Canonicalβœ“ store-url: https://snapcraft.io/lxd contact: https://github.com/canonical/lxd/issues license: unset description: | LXD is a system container and virtual machine manager.

It offers a simple CLI and REST API to manage local or remote instances, uses an image based workflow and support for a variety of advanced features.

Images are available for all Ubuntu releases and architectures as well as for a wide number of other Linux distributions. Existing integrations with many deployment and operation tools, makes it work just like a public cloud, except everything is under your control.

LXD containers are lightweight, secure by default and a great alternative to virtual machines when running Linux on Linux.

LXD virtual machines are modern and secure, using UEFI and secure-boot by default and a great choice when a different kernel or operating system is needed.

With clustering, up to 50 LXD servers can be easily joined and managed together with the same tools and APIs and without needing any external dependencies.

Supported configuration options for the snap (snap set lxd [=...]):

- ceph.builtin: Use snap-specific Ceph configuration [default=false]
- ceph.external: Use the system's ceph tools (ignores ceph.builtin) [default=false]
- criu.enable: Enable experimental live-migration support [default=false]
- daemon.debug: Increase logging to debug level [default=false]
- daemon.group: Set group of users that have full control over LXD [default=lxd]
- daemon.user.group: Set group of users that have restricted LXD access [default=lxd]
- daemon.preseed: Pass a YAML configuration to `lxd init` on initial start
- daemon.syslog: Send LXD log events to syslog [default=false]
- daemon.verbose: Increase logging to verbose level [default=false]
- lvm.external: Use the system's LVM tools [default=false]
- lxcfs.pidfd: Start per-container process tracking [default=false]
- lxcfs.loadavg: Start tracking per-container load average [default=false]
- lxcfs.cfs: Consider CPU shares for CPU usage [default=false]
- lxcfs.debug: Increase logging to debug level [default=false]
- openvswitch.builtin: Run a snap-specific OVS daemon [default=false]
- openvswitch.external: Use the system's OVS tools (ignores openvswitch.builtin) [default=false]
- ovn.builtin: Use snap-specific OVN configuration [default=false]
- shiftfs.enable: Enable shiftfs support [default=auto]

For system-wide configuration of the CLI, place your configuration in /var/snap/lxd/common/global-conf/ (config.yml and servercerts) commands:

tomponline commented 11 months ago

@monstermunchkin please can you investigate whether the patch you wrote was applied to this user and if so why it didnt take effect. Thanks

ckruijntjens commented 11 months ago

hmm,

i exported the vm. deleted the vm and reimported it. now i can snapshot it.

tomponline commented 11 months ago

Yep that would likely fix the issue (as new volumes are not affexted), but now we dont have a reproducer to ensure the patch works. :)

tomposmiko commented 11 months ago

Here is mine:

bob7 ~ # lxc version
Client version: 5.18
Server version: 5.18

bob7 ~ # lxc snapshot narwhalci man-2023-10-10_before_plugin_update
Error: Create instance snapshot: Invalid option for volume "narwhalci/man-2023-10-10_before_plugin_update" option "block.filesystem"

bob7 ~ # sudo snap info lxd
name:      lxd
summary:   LXD - container and VM manager
publisher: Canonicalβœ“
store-url: https://snapcraft.io/lxd
contact:   https://github.com/canonical/lxd/issues
license:   unset
description: |
  LXD is a system container and virtual machine manager.

  It offers a simple CLI and REST API to manage local or remote instances,
  uses an image based workflow and support for a variety of advanced features.

  Images are available for all Ubuntu releases and architectures as well
  as for a wide number of other Linux distributions. Existing
  integrations with many deployment and operation tools, makes it work
  just like a public cloud, except everything is under your control.

  LXD containers are lightweight, secure by default and a great
  alternative to virtual machines when running Linux on Linux.

  LXD virtual machines are modern and secure, using UEFI and secure-boot
  by default and a great choice when a different kernel or operating
  system is needed.

  With clustering, up to 50 LXD servers can be easily joined and managed
  together with the same tools and APIs and without needing any external
  dependencies.

  Supported configuration options for the snap (snap set lxd [<key>=<value>...]):

    - ceph.builtin: Use snap-specific Ceph configuration [default=false]
    - ceph.external: Use the system's ceph tools (ignores ceph.builtin) [default=false]
    - criu.enable: Enable experimental live-migration support [default=false]
    - daemon.debug: Increase logging to debug level [default=false]
    - daemon.group: Set group of users that have full control over LXD [default=lxd]
    - daemon.user.group: Set group of users that have restricted LXD access [default=lxd]
    - daemon.preseed: Pass a YAML configuration to `lxd init` on initial start
    - daemon.syslog: Send LXD log events to syslog [default=false]
    - daemon.verbose: Increase logging to verbose level [default=false]
    - lvm.external: Use the system's LVM tools [default=false]
    - lxcfs.pidfd: Start per-container process tracking [default=false]
    - lxcfs.loadavg: Start tracking per-container load average [default=false]
    - lxcfs.cfs: Consider CPU shares for CPU usage [default=false]
    - lxcfs.debug: Increase logging to debug level [default=false]
    - openvswitch.builtin: Run a snap-specific OVS daemon [default=false]
    - openvswitch.external: Use the system's OVS tools (ignores openvswitch.builtin) [default=false]
    - ovn.builtin: Use snap-specific OVN configuration [default=false]
    - shiftfs.enable: Enable shiftfs support [default=auto]

  For system-wide configuration of the CLI, place your configuration in
  /var/snap/lxd/common/global-conf/ (config.yml and servercerts)
commands:
  - lxd.benchmark
  - lxd.buginfo
  - lxd.check-kernel
  - lxd.lxc
  - lxd.lxc-to-lxd
  - lxd
  - lxd.migrate
services:
  lxd.activate:    oneshot, enabled, inactive
  lxd.daemon:      simple, enabled, active
  lxd.user-daemon: simple, enabled, inactive
snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     latest/stable
refresh-date: 6 days ago, at 03:25 CEST
channels:
  latest/stable:    5.18-db8c6f9  2023-10-04 (25846) 189MB -
  latest/candidate: 5.18-8fb88e3  2023-10-10 (25963) 189MB -
  latest/beta:      ↑                                      
  latest/edge:      git-0ccc9bf   2023-10-10 (25973) 158MB -
  5.18/stable:      5.18-762f582  2023-09-26 (25748) 189MB -
  5.18/candidate:   ↑                                      
  5.18/beta:        ↑                                      
  5.18/edge:        ↑                                      
  5.17/stable:      5.17-e5ead86  2023-08-29 (25505) 184MB -
  5.17/candidate:   ↑                                      
  5.17/beta:        ↑                                      
  5.17/edge:        ↑                                      
  5.16/stable:      5.16-f2b0200  2023-07-26 (25353) 183MB -
  5.16/candidate:   ↑                                      
  5.16/beta:        ↑                                      
  5.16/edge:        ↑                                      
  5.15/stable:      5.15-3fe7435  2023-06-28 (25086) 181MB -
  5.15/candidate:   ↑                                      
  5.15/beta:        ↑                                      
  5.15/edge:        ↑                                      
  5.0/stable:       5.0.2-838e1b2 2023-01-25 (24322) 117MB -
  5.0/candidate:    5.0.2-838e1b2 2023-01-18 (24322) 117MB -
  5.0/beta:         ↑                                      
  5.0/edge:         git-7f8a581   2023-10-10 (25964) 125MB -
  4.0/stable:       4.0.9-a29c6f1 2022-12-04 (24061)  96MB -
  4.0/candidate:    4.0.9-a29c6f1 2022-12-02 (24061)  96MB -
  4.0/beta:         ↑                                      
  4.0/edge:         git-407205d   2022-11-22 (23988)  96MB -
  3.0/stable:       3.0.4         2019-10-10 (11348)  55MB -
  3.0/candidate:    3.0.4         2019-10-10 (11348)  55MB -
  3.0/beta:         ↑                                      
  3.0/edge:         git-81b81b9   2019-10-10 (11362)  55MB -
installed:          5.18-db8c6f9             (25846) 189MB -
tomponline commented 11 months ago

Thanks

@monstermunchkin looks like the patch hasn't quite done the trick.

tomposmiko commented 11 months ago

@tomponline Is there any update? Or is there any other workaround then doing export/import of the instance?

tomponline commented 11 months ago

@monstermunchkin is working on a new patch for this for LXD 5.19

monstermunchkin commented 11 months ago

Here's the updated patch: https://github.com/canonical/lxd/pull/12390