canonical / lxd

Powerful system container and virtual machine manager
GNU Affero General Public License v3.0
4.27k stars 912 forks source link

Live migration between hosts fails with lxc move #7509

Closed YuichiroMaeyama closed 4 years ago

YuichiroMaeyama commented 4 years ago

Required information

Issue description

Live migration fails

Steps to reproduce

  1. Step one nodeA:$ lxc list +--------------+---------+---------------------+------+-----------+-----------+----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | +--------------+---------+---------------------+------+-----------+-----------+----------+ | bionic-nodeA | RUNNING | (eth0) | | CONTAINER | 0 | nodeA | +--------------+---------+---------------------+------+-----------+-----------+----------+

  2. Step two nodeB:$ lxc list +------+-------+------+------+------+-----------+----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | +------+-------+------+------+------+-----------+----------+

  1. Step three nodeB:$ lxc move nodeA:bionic nodeA-bionic Error: Failed instance creation: Error transferring instance data: Failed to run: /snap/lxd/current/bin/lxd forkmigrate bionic-nodeA /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/bionic-nodeA/lxc.conf /tmp/lxd_restore_363268097/final true: Failures are recorded in the following logs on host B /var/snap/lxd/common/lxd/logs/bionic-nodeA/lxc.log lxc bionic-nodeA 20200610033632.903 ERROR criu-criu.c:criu_ok:872-Found un-dumpable network: phys (eth0)

Information to attach

---- NodeA ----

nodeA@node:~$ lxc list
|     NAME     |  STATE  |        IPV4         | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
| bionic-nodeA | RUNNING | (eth0) |      | CONTAINER | 0         | nodeA    |

nodeA@node:~$ lxc info
  core.trust_password: true
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
api_status: stable
api_version: "1.0"
auth: trusted
public: false
- tls
  - x86_64
  - i686
  certificate: |
    -----END CERTIFICATE-----
  certificate_fingerprint: 3831b57db5126a1631457f3759625c2adec19bb88ffd199bc3a2fbfb458412d0
  driver: lxc
  driver_version: 4.0.2
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.0-33-generic
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_notify: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: true
  server_name: nodeA
  server_pid: 14775
  server_version: "4.2"
  storage: zfs
  storage_version: 0.8.3-1ubuntu12

nodeA@node:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:5c:96:75 brd ff:ff:ff:ff:ff:ff
    inet brd scope global dynamic noprefixroute enp0s3
       valid_lft 81468sec preferred_lft 81468sec
    inet6 fe80::186a:b205:635e:1e6d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:80:79:4f brd ff:ff:ff:ff:ff:ff
    inet brd scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::f59f:3cef:6a37:77a2/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:4e:88:a5 brd ff:ff:ff:ff:ff:ff
    inet brd scope global noprefixroute enp0s9
       valid_lft forever preferred_lft forever
    inet6 fe80::faa8:1a6d:d55:1873/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
11: lxdfan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 16:93:90:9d:df:94 brd ff:ff:ff:ff:ff:ff
    inet scope global lxdfan0
       valid_lft forever preferred_lft forever
    inet6 fe80::1493:90ff:fe9d:df94/64 scope link 
       valid_lft forever preferred_lft forever
12: lxdfan0-mtu: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1450 qdisc noqueue master lxdfan0 state UNKNOWN group default qlen 1000
    link/ether d2:07:2f:b8:52:0b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d007:2fff:feb8:520b/64 scope link 
       valid_lft forever preferred_lft forever
13: lxdfan0-fan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master lxdfan0 state UNKNOWN group default qlen 1000
    link/ether 16:93:90:9d:df:94 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1493:90ff:fe9d:df94/64 scope link 
       valid_lft forever preferred_lft forever
14: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:f6:1f:cf:4f:2f brd ff:ff:ff:ff:ff:ff
    inet scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::6047:c4ff:fe27:2cb/64 scope link 
       valid_lft forever preferred_lft forever
16: vethdedbe146@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether 5e:f6:1f:cf:4f:2f brd ff:ff:ff:ff:ff:ff link-netnsid 0

---- NodeB ----

nodeB@node:~$ lxc list

nodeB@node:~$ lxc info
  core.trust_password: true
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
api_status: stable
api_version: "1.0"
auth: trusted
public: false
- tls
  - x86_64
  - i686
  certificate: |
    -----END CERTIFICATE-----
  certificate_fingerprint: c24061ffeebe89af6c3fcdacfdcbf48451b75a06ce78285ecbfd01499a94e969
  driver: lxc
  driver_version: 4.0.2
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.0-33-generic
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_notify: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: true
  server_name: nodeB
  server_pid: 81359
  server_version: "4.2"
  storage: zfs
  storage_version: 0.8.3-1ubuntu12

nodeB@node:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:d9:b8:d6 brd ff:ff:ff:ff:ff:ff
    inet brd scope global dynamic noprefixroute enp0s3
       valid_lft 72896sec preferred_lft 72896sec
    inet6 fe80::1864:a5c8:bc0d:3cdf/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:fd:df:1f brd ff:ff:ff:ff:ff:ff
    inet brd scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::239b:c419:c51f:12d4/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:72:f3:8e brd ff:ff:ff:ff:ff:ff
    inet brd scope global noprefixroute enp0s9
       valid_lft forever preferred_lft forever
    inet6 fe80::e291:2f66:569c:d988/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
58: vethb3ed9310@veth69f32cd3: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
    link/ether 00:16:3e:c7:78:bf brd ff:ff:ff:ff:ff:ff
59: veth69f32cd3@vethb3ed9310: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1450 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether a6:e7:e3:73:54:4a brd ff:ff:ff:ff:ff:ff
60: veth8f276b0c@veth552f22ab: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
    link/ether 00:16:3e:02:e5:36 brd ff:ff:ff:ff:ff:ff
61: veth552f22ab@veth8f276b0c: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1450 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether d6:7e:cb:22:53:cc brd ff:ff:ff:ff:ff:ff
62: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether ba:62:26:28:cc:92 brd ff:ff:ff:ff:ff:ff
    inet scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::b862:26ff:fe28:cc92/64 scope link 
       valid_lft forever preferred_lft forever
63: lxdfan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:c6:c9:2d:a4:48 brd ff:ff:ff:ff:ff:ff
    inet scope global lxdfan0
       valid_lft forever preferred_lft forever
    inet6 fe80::7421:7bff:fe3e:481/64 scope link 
       valid_lft forever preferred_lft forever
64: lxdfan0-mtu: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1450 qdisc noqueue master lxdfan0 state UNKNOWN group default qlen 1000
    link/ether fa:c3:03:92:57:7e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f8c3:3ff:fe92:577e/64 scope link 
       valid_lft forever preferred_lft forever
65: lxdfan0-fan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master lxdfan0 state UNKNOWN group default qlen 1000
    link/ether 76:21:7b:3e:04:81 brd ff:ff:ff:ff:ff:ff
66: veth40e53951@veth1e25dec1: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000
    link/ether 00:16:3e:02:e5:36 brd ff:ff:ff:ff:ff:ff
67: veth1e25dec1@veth40e53951: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1450 qdisc noqueue master lxdfan0 state LOWERLAYERDOWN group default qlen 1000
    link/ether 6a:c6:c9:2d:a4:48 brd ff:ff:ff:ff:ff:ff
stgraber commented 4 years ago

Not a bug. LXD properly invoked CRIU which then failed to dump your container with a pretty clear error.

Note that even if you get past that network dumping error, there's no way you'll be able to dump a bionic container. Both systemd and AppArmor in those cause structures and sockets that CRIU cannot handle at this time.

YuichiroMaeyama commented 4 years ago

thank you for your quick response. I understood that it was a problem on the CRIU side. So I have two questions

  1. Will live migration succeed if I use a distribution other than Bionic Container? What distribution should I use for live migration?

  2. What are your plans and progress in discussing this issue with the CRIU, and do you plan to incorporate live migration support into milestones in the future? I would appreciate it if you could tell me your plan.

stgraber commented 4 years ago

We used to have funding to work on CRIU as mostly a research project type thing, that's what got us our current integration and a number of fixes for CRIU but that is no longer the case and we don't have anyone on the team working on this at this time.

Busybox/Alpine are the best for checkpoint/restore at this time, so long as you don't run any recent services on it. On the network side, currently only containers without a network device are working as I believe our modern network device logic doesn't work with it at this time.

stgraber commented 4 years ago

So it can be used for a demo of CRIU but that's about the extent of its use at this time. There are active users of this mind you, but they have extremely specific workloads which just happen to tick all the right boxes.

stgraber commented 3 years ago

@intrepidsilence until CRIU/liblxc get fixed to handle the current NIC devices, stateless migration is your best bet. You could do a first couple of pass with lxc copy SRC DEST --refresh to get you as close as possible, then stop the source, do another refresh and start the destination.

intrepidsilence commented 3 years ago

@stgraber Thank you so much for the quick reply.