canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 928 forks source link

Failing live migration #2110

Closed fnzv closed 8 years ago

fnzv commented 8 years ago

Required information

I'm having an issue when i try to live migrate LXD containers from one host to another when they are in running state but when they are shut off i can copy\move them without problems. On both hosts i have this config: config: core.https_address: '[::]' core.trust_password: true

Steps to reproduce

1) Added LXC2 to remote list and accepted the certificate 2) lxc move lnmp LXC2:migrated error: Error transferring container data: checkpoint failed: (00.014228) Error (sockets.c:129): Diag module missing (-2) (00.015240) Error (sockets.c:129): Diag module missing (-2) (00.016208) Error (sockets.c:129): Diag module missing (-2) (00.098007) Error (cr-dump.c:1600): Dumping FAILED.

i get the same error even when i try to do a stateful snapshot: lxc snapshot lnmp snap1 --stateful error: checkpoint failed

Information to attach

simos commented 8 years ago

There are some extra requirements that are listed at https://www.stgraber.org/2016/04/25/lxd-2-0-live-migration-912/ Can you verify that those requirements are fulfilled?

fnzv commented 8 years ago

Yes, i followed this guide

simos commented 8 years ago

The guide says

  1. A very recent Linux kernel, 4.4 or higher.
  2. CRIU 2.0, possibly with some cherry-picked commits depending on your exact kernel configuration.
  3. Run LXD directly on the host. It’s not possible to use those features with container nesting.
  4. For migration, the target machine must at least implement the instruction set of the source, the target kernel must at least offer the same syscalls as the source and any kernel filesystem which was mounted on the source must also be mountable on the target.

Can you elaborate what you did for requirements 2 and 4?

stgraber commented 8 years ago

Sounds like it's just CRIU failing to serialize some weird socket type. The report is from Ubuntu 16.04 which does include the right CRIU version and kernel.

Please file a bug on Launchpad with the information requested on the page @simos linked to.

fnzv commented 8 years ago

@simos

  1. because its ubuntu 16.04 i just installed criu with: apt-get install criu
  2. the machines are identical (2 fresh ubuntu 16 installs) so i don't see problems @stgraber Okk i'll file a bug on Launchpad
stgraber commented 8 years ago

Cool, so yeah, probably just a criu limitation, we'll track this on Launchpad.