canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

"rbd: error opening image" when copying VM instance #12631

Closed slapcat closed 11 months ago

slapcat commented 11 months ago

Summary

On a fresh deployment of microcloud, a VM fails to copy with the following error:

$ lxc cp testvm bkup
Error: Failed to run: rbd --id admin --cluster ceph --pool lxd_remote map virtual-machine_bkup: exit status 2 (rbd: error opening image virtual-machine_bkup: (2) No such file or directory)

Containers can be copied successfully.

Workaround

Copying snapshots of VMs works: lxc cp testvm/snap0 bkup

Additional Info

$ sudo ceph -s
  cluster:
    id:     dc0f415d-7b21-4071-9685-bea9363fe743
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum mc1-3,mc1-1,mc1-2 (age 4h)
    mgr: mc1-3(active, since 4h), standbys: mc1-1, mc1-2
    osd: 6 osds: 6 up (since 4h), 6 in (since 4h)

  data:
    pools:   2 pools, 33 pgs
    objects: 290 objects, 806 MiB
    usage:   4.2 GiB used, 116 GiB / 120 GiB avail
    pgs:     33 active+clean
# microceph.rbd ls lxd_remote
lxd_lxd_remote
virtual-machine_testvm
virtual-machine_testvm.block
zombie_image_a5251f08a6849bf5be9069871bfb99bbbe3aae5f8a1b436df26fcc5716a14c86_ext4
zombie_image_a5251f08a6849bf5be9069871bfb99bbbe3aae5f8a1b436df26fcc5716a14c86_ext4.block
$ lxc info testvm
Name: testvm
Status: RUNNING
Type: virtual-machine
Architecture: x86_64
Location: mc1-3
PID: 14139
Created: 2023/12/06 15:55 UTC
Last Used: 2023/12/06 15:56 UTC

Resources:
  Processes: 11
  CPU usage:
    CPU usage (in seconds): 0
  Memory usage:
    Memory (current): 132.43MiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: tap611f9c64
      MAC address: 00:16:3e:32:5a:a3
      MTU: 1500
      Bytes received: 2.91kB
      Bytes sent: 4.14kB
      Packets received: 18
      Packets sent: 35
      IP addresses:
        inet:  240.3.115.207/8 (global)
        inet6: fe80::216:3eff:fe32:5aa3/64 (link)
    lo:
      Type: loopback
      State: UP
      MTU: 65536
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0
      IP addresses:
        inet:  127.0.0.1/8 (local)
        inet6: ::1/128 (local)

Snapshots:
+-------+----------------------+------------+----------+
| NAME  |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------+----------------------+------------+----------+
| snap0 | 2023/12/06 15:56 UTC |            | NO       |
+-------+----------------------+------------+----------+
| snap1 | 2023/12/06 16:12 UTC |            | NO       |
+-------+----------------------+------------+----------+
Output of "lxc cp --debug" ubuntu@mc1-1:~$ lxc cp testvm bkup --debug DEBUG [2023-12-06T18:19:57Z] Connecting to a local LXD over a Unix socket DEBUG [2023-12-06T18:19:57Z] Sending request to LXD etag= method=GET url="http://unix.socket/1.0" DEBUG [2023-12-06T18:19:57Z] Got response struct from LXD DEBUG [2023-12-06T18:19:57Z] { "config": { "cluster.https_address": "10.5.2.228:8443", "core.https_address": "10.5.2.228:8443", "network.ovn.northbound_connection": "ssl:10.5.3.115:6641,ssl:10.5.2.228:6641,ssl:10.5.2.193:6641", "storage.backups_volume": "local/backups", "storage.images_volume": "local/images" }, "api_extensions": [ "storage_zfs_remove_snapshots", "container_host_shutdown_timeout", "container_stop_priority", "container_syscall_filtering", "auth_pki", "container_last_used_at", "etag", "patch", "usb_devices", "https_allowed_credentials", "image_compression_algorithm", "directory_manipulation", "container_cpu_time", "storage_zfs_use_refquota", "storage_lvm_mount_options", "network", "profile_usedby", "container_push", "container_exec_recording", "certificate_update", "container_exec_signal_handling", "gpu_devices", "container_image_properties", "migration_progress", "id_map", "network_firewall_filtering", "network_routes", "storage", "file_delete", "file_append", "network_dhcp_expiry", "storage_lvm_vg_rename", "storage_lvm_thinpool_rename", "network_vlan", "image_create_aliases", "container_stateless_copy", "container_only_migration", "storage_zfs_clone_copy", "unix_device_rename", "storage_lvm_use_thinpool", "storage_rsync_bwlimit", "network_vxlan_interface", "storage_btrfs_mount_options", "entity_description", "image_force_refresh", "storage_lvm_lv_resizing", "id_map_base", "file_symlinks", "container_push_target", "network_vlan_physical", "storage_images_delete", "container_edit_metadata", "container_snapshot_stateful_migration", "storage_driver_ceph", "storage_ceph_user_name", "resource_limits", "storage_volatile_initial_source", "storage_ceph_force_osd_reuse", "storage_block_filesystem_btrfs", "resources", "kernel_limits", "storage_api_volume_rename", "macaroon_authentication", "network_sriov", "console", "restrict_devlxd", "migration_pre_copy", "infiniband", "maas_network", "devlxd_events", "proxy", "network_dhcp_gateway", "file_get_symlink", "network_leases", "unix_device_hotplug", "storage_api_local_volume_handling", "operation_description", "clustering", "event_lifecycle", "storage_api_remote_volume_handling", "nvidia_runtime", "container_mount_propagation", "container_backup", "devlxd_images", "container_local_cross_pool_handling", "proxy_unix", "proxy_udp", "clustering_join", "proxy_tcp_udp_multi_port_handling", "network_state", "proxy_unix_dac_properties", "container_protection_delete", "unix_priv_drop", "pprof_http", "proxy_haproxy_protocol", "network_hwaddr", "proxy_nat", "network_nat_order", "container_full", "candid_authentication", "backup_compression", "candid_config", "nvidia_runtime_config", "storage_api_volume_snapshots", "storage_unmapped", "projects", "candid_config_key", "network_vxlan_ttl", "container_incremental_copy", "usb_optional_vendorid", "snapshot_scheduling", "snapshot_schedule_aliases", "container_copy_project", "clustering_server_address", "clustering_image_replication", "container_protection_shift", "snapshot_expiry", "container_backup_override_pool", "snapshot_expiry_creation", "network_leases_location", "resources_cpu_socket", "resources_gpu", "resources_numa", "kernel_features", "id_map_current", "event_location", "storage_api_remote_volume_snapshots", "network_nat_address", "container_nic_routes", "rbac", "cluster_internal_copy", "seccomp_notify", "lxc_features", "container_nic_ipvlan", "network_vlan_sriov", "storage_cephfs", "container_nic_ipfilter", "resources_v2", "container_exec_user_group_cwd", "container_syscall_intercept", "container_disk_shift", "storage_shifted", "resources_infiniband", "daemon_storage", "instances", "image_types", "resources_disk_sata", "clustering_roles", "images_expiry", "resources_network_firmware", "backup_compression_algorithm", "ceph_data_pool_name", "container_syscall_intercept_mount", "compression_squashfs", "container_raw_mount", "container_nic_routed", "container_syscall_intercept_mount_fuse", "container_disk_ceph", "virtual-machines", "image_profiles", "clustering_architecture", "resources_disk_id", "storage_lvm_stripes", "vm_boot_priority", "unix_hotplug_devices", "api_filtering", "instance_nic_network", "clustering_sizing", "firewall_driver", "projects_limits", "container_syscall_intercept_hugetlbfs", "limits_hugepages", "container_nic_routed_gateway", "projects_restrictions", "custom_volume_snapshot_expiry", "volume_snapshot_scheduling", "trust_ca_certificates", "snapshot_disk_usage", "clustering_edit_roles", "container_nic_routed_host_address", "container_nic_ipvlan_gateway", "resources_usb_pci", "resources_cpu_threads_numa", "resources_cpu_core_die", "api_os", "container_nic_routed_host_table", "container_nic_ipvlan_host_table", "container_nic_ipvlan_mode", "resources_system", "images_push_relay", "network_dns_search", "container_nic_routed_limits", "instance_nic_bridged_vlan", "network_state_bond_bridge", "usedby_consistency", "custom_block_volumes", "clustering_failure_domains", "resources_gpu_mdev", "console_vga_type", "projects_limits_disk", "network_type_macvlan", "network_type_sriov", "container_syscall_intercept_bpf_devices", "network_type_ovn", "projects_networks", "projects_networks_restricted_uplinks", "custom_volume_backup", "backup_override_name", "storage_rsync_compression", "network_type_physical", "network_ovn_external_subnets", "network_ovn_nat", "network_ovn_external_routes_remove", "tpm_device_type", "storage_zfs_clone_copy_rebase", "gpu_mdev", "resources_pci_iommu", "resources_network_usb", "resources_disk_address", "network_physical_ovn_ingress_mode", "network_ovn_dhcp", "network_physical_routes_anycast", "projects_limits_instances", "network_state_vlan", "instance_nic_bridged_port_isolation", "instance_bulk_state_change", "network_gvrp", "instance_pool_move", "gpu_sriov", "pci_device_type", "storage_volume_state", "network_acl", "migration_stateful", "disk_state_quota", "storage_ceph_features", "projects_compression", "projects_images_remote_cache_expiry", "certificate_project", "network_ovn_acl", "projects_images_auto_update", "projects_restricted_cluster_target", "images_default_architecture", "network_ovn_acl_defaults", "gpu_mig", "project_usage", "network_bridge_acl", "warnings", "projects_restricted_backups_and_snapshots", "clustering_join_token", "clustering_description", "server_trusted_proxy", "clustering_update_cert", "storage_api_project", "server_instance_driver_operational", "server_supported_storage_drivers", "event_lifecycle_requestor_address", "resources_gpu_usb", "clustering_evacuation", "network_ovn_nat_address", "network_bgp", "network_forward", "custom_volume_refresh", "network_counters_errors_dropped", "metrics", "image_source_project", "clustering_config", "network_peer", "linux_sysctl", "network_dns", "ovn_nic_acceleration", "certificate_self_renewal", "instance_project_move", "storage_volume_project_move", "cloud_init", "network_dns_nat", "database_leader", "instance_all_projects", "clustering_groups", "ceph_rbd_du", "instance_get_full", "qemu_metrics", "gpu_mig_uuid", "event_project", "clustering_evacuation_live", "instance_allow_inconsistent_copy", "network_state_ovn", "storage_volume_api_filtering", "image_restrictions", "storage_zfs_export", "network_dns_records", "storage_zfs_reserve_space", "network_dns_records", "storage_zfs_reserve_space", "network_acl_log", "storage_zfs_blocksize", "metrics_cpu_seconds", "instance_snapshot_never", "certificate_token", "instance_nic_routed_neighbor_probe", "event_hub", "agent_nic_config", "projects_restricted_intercept", "metrics_authentication", "images_target_project", "cluster_migration_inconsistent_copy", "cluster_ovn_chassis", "container_syscall_intercept_sched_setscheduler", "storage_lvm_thinpool_metadata_size", "storage_volume_state_total", "instance_file_head", "instances_nic_host_name", "image_copy_profile", "container_syscall_intercept_sysinfo", "clustering_evacuation_mode", "resources_pci_vpd", "qemu_raw_conf", "storage_cephfs_fscache", "network_load_balancer", "vsock_api", "instance_ready_state", "network_bgp_holdtime", "storage_volumes_all_projects", "metrics_memory_oom_total", "storage_buckets", "storage_buckets_create_credentials", "metrics_cpu_effective_total", "projects_networks_restricted_access", "storage_buckets_local", "loki", "acme", "internal_metrics", "cluster_join_token_expiry", "remote_token_expiry", "init_preseed", "storage_volumes_created_at", "cpu_hotplug", "projects_networks_zones", "network_txqueuelen", "cluster_member_state", "instances_placement_scriptlet", "storage_pool_source_wipe", "zfs_block_mode", "instance_generation_id", "disk_io_cache", "amd_sev", "storage_pool_loop_resize", "migration_vm_live", "ovn_nic_nesting", "oidc", "network_ovn_l3only", "ovn_nic_acceleration_vdpa", "cluster_healing", "instances_state_total", "auth_user", "security_csm", "instances_rebuild", "numa_cpu_placement", "custom_volume_iso", "network_allocations", "storage_api_remote_volume_snapshot_copy", "zfs_delegate", "operations_get_query_all_projects", "metadata_configuration", "syslog_socket", "event_lifecycle_name_and_project", "instances_nic_limits_priority", "disk_initial_volume_configuration", "operation_wait" ], "api_status": "stable", "api_version": "1.0", "auth": "trusted", "public": false, "auth_methods": [ "tls" ], "auth_user_name": "ubuntu", "auth_user_method": "unix", "environment": { "addresses": [ "10.5.2.228:8443" ], "architectures": [ "x86_64", "i686" ], "certificate": "-----BEGIN CERTIFICATE-----\nMIIB4TCCAWegAwIBAgIRANL7zuDMWJSCL4u2qETNiuwwCgYIKoZIzj0EAwMwIzEM\n MAoGA1UEChMDTFhEMRMwEQYDVQQDDApyb290QG1jMS0zMB4XDTIzMTIwNjE0NDA1\nOVoXDTMzMTIwMzE0NDA1OVowIzEMMAoGA1UEChMDTFhEMRMwEQYDVQQDDApyb290\nQG1 jMS0zMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEqIcSEfIP3w+U1LpThlnKf+ue\nacWfVK/mQArWuPRm9O0Ln59K/9y4Ygw6L7aJizfi9GwjZR70jWZViiuWXJj27dCL\nnuXi0g h60UIBI0TGmqL5vYBu4UrlzGKfI3lYochvo18wXTAOBgNVHQ8BAf8EBAMC\nBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAoBgNVHREEITAf\nggVtYzEtM 4cEfwAAAYcQAAAAAAAAAAAAAAAAAAAAATAKBggqhkjOPQQDAwNoADBl\nAjBZlQhzaYei6QdZkHD+aQUlLc9xdCuFrfx3c4oUKz02u0em7fa4CA9yCNXngr++\n15MCMQDSY7FU HT30c2smgUwHvJBqSW1P2N0GFPuUQ3dNu248xlkBG8LnJq8cvXgw\nx/re1IU=\n-----END CERTIFICATE-----\n", "certificate_fingerprint": "87c7faf90d5b2d28dbd973949653f860b7bb25b6570c7e6be13985df6a1d81ed", "driver": "lxc | qemu", "driver_version": "5.0.3 | 8.1.1", "firewall": "nftables", "kernel": "Linux", "kernel_architecture": "x86_64", "kernel_features": { "idmapped_mounts": "true", "netnsid_getifaddrs": "true", "seccomp_listener": "true", "seccomp_listener_continue": "true", "shiftfs": "false", "uevent_injection": "true", "unpriv_fscaps": "true" }, "kernel_version": "5.15.0-89-generic", "lxc_features": { "cgroup2": "true", "core_scheduling": "true", "devpts_fd": "true", "idmapped_mounts_v2": "true", "mount_injection_file": "true", "network_gateway_device_route": "true", "network_ipvlan": "true", "network_l2proxy": "true", "network_phys_macvlan_mtu": "true", "network_veth_router": "true", "pidfd": "true", "seccomp_allow_deny_syntax": "true", "seccomp_notify": "true", "seccomp_proxy_send_notify_fd": "true" }, "os_name": "Ubuntu", "os_version": "22.04", "project": "default", "server": "lxd", "server_clustered": true, "server_event_mode": "full-mesh", "server_name": "mc1-1", "server_pid": 12236, "server_version": "5.19", "storage": "", "storage_version": "", "storage_supported_drivers": [ { "Name": "dir", "Version": "1", "Remote": false }, { "Name": "lvm", "Version": "2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0", "Remote": false }, { "Name": "zfs", "Version": "2.1.5-1ubuntu6~22.04.1", "Remote": false }, { "Name": "btrfs", "Version": "5.16.2", "Remote": false }, { "Name": "ceph", "Version": "17.2.6", "Remote": true }, { "Name": "cephfs", "Version": "17.2.6", "Remote": true }, { "Name": "cephobject", "Version": "17.2.6", "Remote": true } ] } } DEBUG [2023-12-06T18:19:57Z] Sending request to LXD etag= method=GET url="http://unix.socket/1.0/instances/test vm" DEBUG [2023-12-06T18:19:58Z] Got response struct from LXD DEBUG [2023-12-06T18:19:58Z] { "architecture": "x86_64", "config": { "image.architecture": "amd64", "image.description": "Alpine edge amd64 (20231204_13:00)", "image.os": "Alpine", "image.release": "edge", "image.requirements.secureboot": "false", "image.serial": "20231204_13:00", "image.type": "disk-kvm.img", "image.variant": "default", "security.secureboot": "false", "volatile.base_image": "a5251f08a6849bf5be9069871bfb99bbbe3aae5f8a1b436df26fcc5716a14c86", "volatile.cloud-init.instance-id": "479fe5b5-d76f-429b-a611-03c729c873dd", "volatile.eth0.host_name": "tap611f9c64", "volatile.eth0.hwaddr": "00:16:3e:32:5a:a3", "volatile.last_state.power": "RUNNING", "volatile.uuid": "ba7cec49-1419-458a-889e-3577324f5c4d", "volatile.uuid.generation": "ba7cec49-1419-458a-889e-3577324f5c4d", "volatile.vsock_id": "3741582119" }, "devices": {}, "ephemeral": false, "profiles": [ "default" ], "stateful": false, "description": "", "created_at": "2023-12-06T15:55:04.027356842Z", "expanded_config": { "image.architecture": "amd64", "image.description": "Alpine edge amd64 (20231204_13:00)", "image.os": "Alpine", "image.release": "edge", "image.requirements.secureboot": "false", "image.serial": "20231204_13:00", "image.type": "disk-kvm.img", "image.variant": "default", "security.secureboot": "false", "volatile.base_image": "a5251f08a6849bf5be9069871bfb99bbbe3aae5f8a1b436df26fcc5716a14c86", "volatile.cloud-init.instance-id": "479fe5b5-d76f-429b-a611-03c729c873dd", "volatile.eth0.host_name": "tap611f9c64", "volatile.eth0.hwaddr": "00:16:3e:32:5a:a3", "volatile.last_state.power": "RUNNING", "volatile.uuid": "ba7cec49-1419-458a-889e-3577324f5c4d", "volatile.uuid.generation": "ba7cec49-1419-458a-889e-3577324f5c4d", "volatile.vsock_id": "3741582119" }, "expanded_devices": { "eth0": { "name": "eth0", "network": "lxdfan0", "type": "nic" }, "root": { "path": "/", "pool": "remote", "type": "disk" } }, "name": "testvm", "status": "Running", "status_code": 103, "last_used_at": "2023-12-06T15:56:10.45365861Z", "location": "mc1-3", "type": "virtual-machine", "project": "default" } DEBUG [2023-12-06T18:19:58Z] Connected to the websocket: ws://unix.socket/1.0/events DEBUG [2023-12-06T18:19:58Z] Sending request to LXD etag= method=POST url="http://unix.socket/1.0/instances" DEBUG [2023-12-06T18:19:58Z] { "architecture": "x86_64", "config": { "image.architecture": "amd64", "image.description": "Alpine edge amd64 (20231204_13:00)", "image.os": "Alpine", "image.release": "edge", "image.requirements.secureboot": "false", "image.serial": "20231204_13:00", "image.type": "disk-kvm.img", "image.variant": "default", "security.secureboot": "false", "volatile.base_image": "a5251f08a6849bf5be9069871bfb99bbbe3aae5f8a1b436df26fcc5716a14c86" }, "devices": {}, "ephemeral": false, "profiles": [ "default" ], "stateful": false, "description": "", "name": "bkup", "source": { "type": "copy", "certificate": "", "base-image": "a5251f08a6849bf5be9069871bfb99bbbe3aae5f8a1b436df26fcc5716a14c86", "source": "testvm", "live": true, "allow_inconsistent": false }, "instance_type": "", "type": "virtual-machine" } DEBUG [2023-12-06T18:19:58Z] Got operation from LXD DEBUG [2023-12-06T18:19:58Z] { "id": "377f9f2f-3403-44f0-9851-0eda6e73e1e8", "class": "task", "description": "Creating instance", "created_at": "2023-12-06T18:19:58.039793967Z", "updated_at": "2023-12-06T18:19:58.039793967Z", "status": "Running", "status_code": 103, "resources": { "instances": [ "/1.0/instances/bkup", "/1.0/instances/testvm" ] }, "metadata": null, "may_cancel": false, "err": "", "location": "mc1-1" } DEBUG [2023-12-06T18:19:58Z] Sending request to LXD etag= method=GET url="http://unix.socket/1.0/operations/377f9f2f-3403-44f0-9851-0eda6e73e1e8" DEBUG [2023-12-06T18:19:58Z] Got response struct from LXD DEBUG [2023-12-06T18:19:58Z] { "id": "377f9f2f-3403-44f0-9851-0eda6e73e1e8", "class": "task", "description": "Creating instance", "created_at": "2023-12-06T18:19:58.039793967Z", "updated_at": "2023-12-06T18:19:58.039793967Z", "status": "Running", "status_code": 103, "resources": { "instances": [ "/1.0/instances/bkup", "/1.0/instances/testvm" ] }, "metadata": null, "may_cancel": false, "err": "", "location": "mc1-1" } Error: Failed to run: rbd --id admin --cluster ceph --pool lxd_remote map virtual-machine_bkup: exit status 2 (rbd: error opening image virtual-machine_bkup: (2) No such file or directory)
tomponline commented 11 months ago

@roosterfish please can you look into this as a priority.

roosterfish commented 11 months ago

I am now able to reproduce this bug. It's affecting latest/edge in case the VM has snapshots:

tomponline commented 11 months ago

thanks @roosterfish do you know what change caused this regression?

roosterfish commented 11 months ago

Unfortunately latest/stable seems to be affected too.

tomponline commented 11 months ago

yeah I figured that because the user reported it in 5.19