canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

Stateful snapshot fail on Ubuntu 18.04 #6909

Closed adamryczkowski closed 4 years ago

adamryczkowski commented 4 years ago

Required information

# Issue description

$ lxc snapshot nester live_snap --stateful Error: snapshot dump failed (00.000055) Warn (criu/log.c:203): The early log isn't empty (00.022092) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x4f) btrfs ./run/systemd/unit-root is inaccessible (00.022095) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.022101) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x4f) btrfs ./ is inaccessible (00.022105) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.022119) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x4f) btrfs ./ is inaccessible (00.022122) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.022142) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x4f) btrfs ./ is inaccessible (00.022145) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.022157) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x4f) btrfs ./ is inaccessible (00.022160) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.022162) Warn (criu/fsnotify.c:288): fsnotify: Handle 0x1b:0x108 cannot be opened (00.030997) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory (00.031000) Error (criu/fsnotify.c:291): fsnotify: Can't dump that handle (00.031043) Error (criu/cr-dump.c:1345): Dump files (pid: 5074) failed with -1 (00.034484) Error (criu/cr-dump.c:1743): Dumping FAILED.


# Steps to reproduce

$ sudo snap install lxd $ sudo snap set lxd criu.enable=true $ sudo systemctl reload snap.lxd.daemon $ sudo lxd init # accept the defaults. $ lxc launch ubuntu:bionic nester $ lxc snapshot nester live_snap --stateful Error: snapshot dump failed (00.000189) Warn (criu/log.c:203): The early log isn't empty (00.238875) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible (00.238891) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.238948) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./run/systemd/unit-root is inaccessible (00.238965) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.238987) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible (00.239001) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.239050) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible (00.239066) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.239114) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible (00.239129) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.239173) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible (00.239188) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue (00.239197) Warn (criu/fsnotify.c:288): fsnotify: Handle 0x1b:0x108 cannot be opened (00.275087) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory (00.275097) Error (criu/fsnotify.c:291): fsnotify: Can't dump that handle (00.275192) Error (criu/cr-dump.c:1345): Dump files (pid: 32646) failed with -1 (00.283240) Error (criu/cr-dump.c:1743): Dumping FAILED.


# Information to attach

$ dmesg (...) [ 3091.351457] audit: type=1400 audit(1582213464.747:132): apparmor="STATUS" operation="profileload" label="lxd-nester</var/snap/lxd/common/lxd>//&:lxd-nester_:unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=425 comm="apparmor_parser"

$ lxc info nester --show-log Name: nester Location: none Remote: unix:// Architecture: x86_64 Created: 2020/02/20 15:44 UTC Status: Running Type: container Profiles: default Pid: 32646 Ips: eth0: inet 10.165.25.169 veth516bd843 eth0: inet6 fd42:4f88:bb4e:f118:216:3eff:feaf:9a42 veth516bd843 eth0: inet6 fe80::216:3eff:feaf:9a42 veth516bd843 lo: inet 127.0.0.1 lo: inet6 ::1 Resources: Processes: 24 Disk usage: root: -1B CPU usage: CPU usage (in seconds): 11 Memory usage: Memory (current): 221.03MB Network usage: eth0: Bytes received: 9.79kB Bytes sent: 4.72kB Packets received: 77 Packets sent: 49 lo: Bytes received: 2.76kB Bytes sent: 2.76kB Packets received: 40 Packets sent: 40

Log:

lxc nester 20200220154423.466 WARN cgfsng - cgroups/cgfsng.c:chowmod:1525 - No such file or directory - Failed to chown(/sys/fs/cgroup/unified//lxc.payload/nester/memory.oom.group, 1000000000, 0) lxc 20200220154437.209 ERROR criu - criu.c:do_dump:1325 - dump failed with 1 lxc 20200220154437.209 ERROR criu - criu.c:do_dump:1339 - criu output:

$ lxc config show nester --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20200218) image.label: release image.os: ubuntu image.release: bionic image.serial: "20200218" image.type: squashfs image.version: "18.04" volatile.base_image: 8c4e87e53c024e0449003350f0b0626b124b68060b73c0a7ad9547670e00d4b3 volatile.eth0.host_name: veth516bd843 volatile.eth0.hwaddr: 00:16:3e:af:9a:42 volatile.idmap.base: "0" volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]' volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]' volatile.last_state.power: RUNNING devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:

$ lxc --debug snapshot nester live_snap --stateful
DBUG[02-20|17:01:00] Connecting to a local LXD over a Unix socket 
DBUG[02-20|17:01:00] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[02-20|17:01:00] Got response struct from LXD 
DBUG[02-20|17:01:00] 
    {
        "config": {
            "core.https_address": "[::]:8443",
            "core.trust_password": true
        },
        "api_extensions": [
            "storage_zfs_remove_snapshots",
            "container_host_shutdown_timeout",
            "container_stop_priority",
            "container_syscall_filtering",
            "auth_pki",
            "container_last_used_at",
            "etag",
            "patch",
            "usb_devices",
            "https_allowed_credentials",
            "image_compression_algorithm",
            "directory_manipulation",
            "container_cpu_time",
            "storage_zfs_use_refquota",
            "storage_lvm_mount_options",
            "network",
            "profile_usedby",
            "container_push",
            "container_exec_recording",
            "certificate_update",
            "container_exec_signal_handling",
            "gpu_devices",
            "container_image_properties",
            "migration_progress",
            "id_map",
            "network_firewall_filtering",
            "network_routes",
            "storage",
            "file_delete",
            "file_append",
            "network_dhcp_expiry",
            "storage_lvm_vg_rename",
            "storage_lvm_thinpool_rename",
            "network_vlan",
            "image_create_aliases",
            "container_stateless_copy",
            "container_only_migration",
            "storage_zfs_clone_copy",
            "unix_device_rename",
            "storage_lvm_use_thinpool",
            "storage_rsync_bwlimit",
            "network_vxlan_interface",
            "storage_btrfs_mount_options",
            "entity_description",
            "image_force_refresh",
            "storage_lvm_lv_resizing",
            "id_map_base",
            "file_symlinks",
            "container_push_target",
            "network_vlan_physical",
            "storage_images_delete",
            "container_edit_metadata",
            "container_snapshot_stateful_migration",
            "storage_driver_ceph",
            "storage_ceph_user_name",
            "resource_limits",
            "storage_volatile_initial_source",
            "storage_ceph_force_osd_reuse",
            "storage_block_filesystem_btrfs",
            "resources",
            "kernel_limits",
            "storage_api_volume_rename",
            "macaroon_authentication",
            "network_sriov",
            "console",
            "restrict_devlxd",
            "migration_pre_copy",
            "infiniband",
            "maas_network",
            "devlxd_events",
            "proxy",
            "network_dhcp_gateway",
            "file_get_symlink",
            "network_leases",
            "unix_device_hotplug",
            "storage_api_local_volume_handling",
            "operation_description",
            "clustering",
            "event_lifecycle",
            "storage_api_remote_volume_handling",
            "nvidia_runtime",
            "container_mount_propagation",
            "container_backup",
            "devlxd_images",
            "container_local_cross_pool_handling",
            "proxy_unix",
            "proxy_udp",
            "clustering_join",
            "proxy_tcp_udp_multi_port_handling",
            "network_state",
            "proxy_unix_dac_properties",
            "container_protection_delete",
            "unix_priv_drop",
            "pprof_http",
            "proxy_haproxy_protocol",
            "network_hwaddr",
            "proxy_nat",
            "network_nat_order",
            "container_full",
            "candid_authentication",
            "backup_compression",
            "candid_config",
            "nvidia_runtime_config",
            "storage_api_volume_snapshots",
            "storage_unmapped",
            "projects",
            "candid_config_key",
            "network_vxlan_ttl",
            "container_incremental_copy",
            "usb_optional_vendorid",
            "snapshot_scheduling",
            "container_copy_project",
            "clustering_server_address",
            "clustering_image_replication",
            "container_protection_shift",
            "snapshot_expiry",
            "container_backup_override_pool",
            "snapshot_expiry_creation",
            "network_leases_location",
            "resources_cpu_socket",
            "resources_gpu",
            "resources_numa",
            "kernel_features",
            "id_map_current",
            "event_location",
            "storage_api_remote_volume_snapshots",
            "network_nat_address",
            "container_nic_routes",
            "rbac",
            "cluster_internal_copy",
            "seccomp_notify",
            "lxc_features",
            "container_nic_ipvlan",
            "network_vlan_sriov",
            "storage_cephfs",
            "container_nic_ipfilter",
            "resources_v2",
            "container_exec_user_group_cwd",
            "container_syscall_intercept",
            "container_disk_shift",
            "storage_shifted",
            "resources_infiniband",
            "daemon_storage",
            "instances",
            "image_types",
            "resources_disk_sata",
            "clustering_roles",
            "images_expiry",
            "resources_network_firmware",
            "backup_compression_algorithm",
            "ceph_data_pool_name",
            "container_syscall_intercept_mount",
            "compression_squashfs",
            "container_raw_mount",
            "container_nic_routed",
            "container_syscall_intercept_mount_fuse",
            "container_disk_ceph",
            "virtual-machines",
            "image_profiles",
            "clustering_architecture",
            "resources_disk_id",
            "storage_lvm_stripes",
            "vm_boot_priority",
            "unix_hotplug_devices",
            "api_filtering"
        ],
        "api_status": "stable",
        "api_version": "1.0",
        "auth": "trusted",
        "public": false,
        "auth_methods": [
            "tls"
        ],
        "environment": {
            "addresses": [
                "192.168.0.37:8443",
                "10.165.25.1:8443",
                "[fd42:4f88:bb4e:f118::1]:8443",
                "172.17.0.1:8443"
            ],
            "architectures": [
                "x86_64",
                "i686"
            ],
            "certificate": "-----BEGIN CERTIFICATE-----\nMIIB+zCCAYCgAwIBAgIQD7eMhNuyUgejc0Sv49QAwzAKBggqhkjOPQQDAzAzMRww\nGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRMwEQYDVQQDDApyb290QHQ0OTBz\nMB4XDTIwMDEyMzE0MjA1NVoXDTMwMDEyMDE0MjA1NVowMzEcMBoGA1UEChMTbGlu\ndXhjb250YWluZXJzLm9yZzETMBEGA1UEAwwKcm9vdEB0NDkwczB2MBAGByqGSM49\nAgEGBSuBBAAiA2IABPn6Ixz3E4TUze1Rbf/S4+D+Rc7GAZmAqkn4GrG/CTWRNLLJ\nj87Bvx/sFkvoAQHnOibs38/m8iY6mImkrPZMunMGSWTGLQMicpIQhUo5PztsS9jd\n5bsPCo75h9dLhAg4saNZMFcwDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoGCCsG\nAQUFBwMBMAwGA1UdEwEB/wQCMAAwIgYDVR0RBBswGYIFdDQ5MHOHBMCoACWHBMCo\nKzOHBKwRAAEwCgYIKoZIzj0EAwMDaQAwZgIxAJ81qVlcVknQEhBQ4fMzuBjyfyp4\neEm99sudawrPxQ/ZX8jVz2wBrndiprS9JXYTPQIxAMHKBuauN+cdT3qIDCrILivl\nEkBr9SandY5acrR3fHc0mxsNyyoxUPzdk3Q4SjCvHw==\n-----END CERTIFICATE-----\n",
            "certificate_fingerprint": "892b23d20ba0f82970504b6f9b99bdc60513d62bea299711e1b93cdda6c59ae4",
            "driver": "lxc",
            "driver_version": "3.2.1",
            "kernel": "Linux",
            "kernel_architecture": "x86_64",
            "kernel_features": {
                "netnsid_getifaddrs": "true",
                "seccomp_listener": "true",
                "seccomp_listener_continue": "true",
                "shiftfs": "false",
                "uevent_injection": "true",
                "unpriv_fscaps": "true"
            },
            "kernel_version": "5.3.0-40-generic",
            "lxc_features": {
                "cgroup2": "false",
                "mount_injection_file": "true",
                "network_gateway_device_route": "true",
                "network_ipvlan": "true",
                "network_l2proxy": "true",
                "network_phys_macvlan_mtu": "true",
                "network_veth_router": "true",
                "seccomp_notify": "true"
            },
            "project": "default",
            "server": "lxd",
            "server_clustered": false,
            "server_name": "t490s",
            "server_pid": 29730,
            "server_version": "3.20",
            "storage": "btrfs",
            "storage_version": "4.4"
        }
    } 
DBUG[02-20|17:01:00] Connected to the websocket: ws://unix.socket/1.0/events 
DBUG[02-20|17:01:00] Sending request to LXD                   method=POST url=http://unix.socket/1.0/instances/nester/snapshots etag=
DBUG[02-20|17:01:00] 
    {
        "name": "live_snap",
        "stateful": true,
        "expires_at": null
    } 
DBUG[02-20|17:01:00] Got operation from LXD 
DBUG[02-20|17:01:00] 
    {
        "id": "572ea1d9-acbc-42b5-a643-d0516d4fae25",
        "class": "task",
        "description": "Snapshotting container",
        "created_at": "2020-02-20T17:01:00.774051278+01:00",
        "updated_at": "2020-02-20T17:01:00.774051278+01:00",
        "status": "Running",
        "status_code": 103,
        "resources": {
            "containers": [
                "/1.0/containers/nester"
            ],
            "instances": [
                "/1.0/instances/nester"
            ]
        },
        "metadata": null,
        "may_cancel": false,
        "err": "",
        "location": "none"
    } 
DBUG[02-20|17:01:00] Sending request to LXD                   method=GET url=http://unix.socket/1.0/operations/572ea1d9-acbc-42b5-a643-d0516d4fae25 etag=
DBUG[02-20|17:01:00] Got response struct from LXD 
DBUG[02-20|17:01:00] 
    {
        "id": "572ea1d9-acbc-42b5-a643-d0516d4fae25",
        "class": "task",
        "description": "Snapshotting container",
        "created_at": "2020-02-20T17:01:00.774051278+01:00",
        "updated_at": "2020-02-20T17:01:00.774051278+01:00",
        "status": "Running",
        "status_code": 103,
        "resources": {
            "containers": [
                "/1.0/containers/nester"
            ],
            "instances": [
                "/1.0/instances/nester"
            ]
        },
        "metadata": null,
        "may_cancel": false,
        "err": "",
        "location": "none"
    } 
Error: snapshot dump failed
(00.000186) Warn  (criu/log.c:203): The early log isn't empty
(00.182724) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./run/systemd/unit-root is inaccessible
(00.182740) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue
(00.182763) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible
(00.182777) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue
(00.182825) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible
(00.182840) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue
(00.182885) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible
(00.182899) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue
(00.182942) Error (criu/mount.c:1087): mnt: The file system 0x1b 0x1b (0x3f) btrfs ./ is inaccessible
(00.182956) Error (criu/fsnotify.c:212): fsnotify: Can't open mount for s_dev 1b, continue
(00.182990) Warn  (criu/fsnotify.c:288): fsnotify:  Handle 0x1b:0x108 cannot be opened
(00.217443) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory
(00.217457) Error (criu/fsnotify.c:291): fsnotify:  Can't dump that handle
(00.217584) Error (criu/cr-dump.c:1345): Dump files (pid: 32646) failed with -1
(00.229662) Error (criu/cr-dump.c:1743): Dumping FAILED.
stgraber commented 4 years ago

Those are failures coming from CRIU when trying to checkpoint your container. It looks like it's getting confuse by btrfs in this instance.

Providing those logs at https://github.com/checkpoint-restore/criu may be useful to that project.

On the LXD side specifically, all we do is spawn CRIU and let it dump/restore, that part appears to be working from above log.