checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.97k stars 596 forks source link

Checkpointing failed - (criu/tty.c:2306): tty: Unable to find a master for /3 #406

Closed ofinto closed 3 years ago

ofinto commented 7 years ago

Hello, I tested criu empty lxc container (without programs) and all works fine, but when i checked production container with mysql / nginx / dnsmasq checkpointing failed.

lxc-checkpoint -D /tmp/qa01-lxc -n qa01-lxc
lxc-checkpoint: criu.c: do_dump: 1124 dump failed with 1
lxc-checkpoint: criu.c: do_dump: 1138 criu output: Will skip in-flight TCP connections

Checkpointing qa01-lxc failed.
cat /tmp/qa01-lxc/dump.log 
iptables-restore: invalid option -- 'w'
ip6tables-restore: invalid option -- 'w'
Warn  (criu/autofs.c:79): Failed to find pipe_ino option (old kernel?)
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 41162 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 17197 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 22251 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 22253 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 22254 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 23859 with interrupted system call
tar: ./php5-fpm.sock: socket ignored
tar: ./mysqld/mysqld.sock: socket ignored
tar: ./dbus/system_bus_socket: socket ignored
Error (criu/tty.c:2306): tty: Unable to find a master for /3
iptables-restore: invalid option -- 'w'
ip6tables-restore: invalid option -- 'w'
Error (criu/cr-dump.c:1709): Dumping FAILED.
criu -V
Version: 3.6
GitID: v3.6
lxc-checkpoint --version
2.0.7
uname -r
4.4.0-72-generic
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.2 LTS
Release:    16.04
Codename:   xenial
cat /var/lib/lxc/qa01-lxc/config 
# Template used to create this container: /usr/share/lxc/templates/lxc-debian
# Parameters passed to the template: --release wheezy
# Template script checksum (SHA-1): fb78cf1a2191c82f7aef77993e5253784be6646f
# For additional config options, please look at lxc.container.conf(5)

# Uncomment the following line to support nesting containers:
#lxc.include = /usr/share/lxc/config/nesting.conf
# (Be aware this has security implications)

lxc.rootfs = /var/lib/lxc/qa01-lxc/rootfs
lxc.rootfs.backend = zfs

# Common configuration
lxc.include = /usr/share/lxc/config/debian.common.conf

# Container specific configuration
lxc.tty = 4
lxc.utsname = qa01-lxc
lxc.arch = amd64
lxc.cgroup.memory.limit_in_bytes = 2048M
# BEGIN ANSIBLE MANAGED BLOCK
# Network configuration
lxc.network.type = veth
lxc.network.name = eth1
lxc.network.link = "vmbr3006"
lxc.network.flags = up
# eth0
lxc.network.type = veth
lxc.network.name = eth0
lxc.network.link = vmbr0
# END ANSIBLE MANAGED BLOCK
# hax for criu
lxc.console = none
lxc.tty = 0
lxc.cgroup.devices.deny = c 5:1 rwm
avagin commented 7 years ago

Your config contains lxc.tty twice and I don't know what value is used.

ofinto commented 7 years ago

Sorry for that, I changed config, reboot lxc and checkpoint create successfully. Thank you

ofinto commented 7 years ago

Hi, it's me again, checkpoint was created successfully, but restore process was failed restore.log

Warn  (criu/cr-restore.c:1161): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
iptables-restore: invalid option -- 'w'
ip6tables-restore: invalid option -- 'w'
     1: Warn  (criu/sk-unix.c:1515): sk unix: Can't unlink stale socket 0xf5e234cc peer 0 (name /var/run/php5-fpm.sock dir -)
     1: Warn  (criu/sk-unix.c:1515): sk unix: Can't unlink stale socket 0xf5e1ae54 peer 0 (name /var/run/mysqld/mysqld.sock dir -)
     1: Warn  (criu/sk-unix.c:1515): sk unix: Can't unlink stale socket 0xf5e1d6f2 peer 0 (name /var/run/dbus/system_bus_socket dir -)
  2496: Error (criu/shmem.c:563): Can't restore shmem content
  2496: Error (criu/mem.c:1200): `- Can't open vma
  3262: Debug:  Setting 1 queue seq to 3089071390
  3262: Debug:  Setting 2 queue seq to 1658922559
  3262: Debug:  Restoring TCP options
  3262: Debug:      Will turn SAK on
  3262: Debug:      Will set snd_wscale to 7
  3262: Debug:      Will set rcv_wscale to 7
  3262: Debug:      Will turn timestamps on
  3262: Debug: Will set mss clamp to 1460
  3262: Debug:  Setting 1 queue seq to 4291181196
  3262: Debug:  Setting 2 queue seq to 530791768
  3262: Debug:  Restoring TCP options
  3262: Debug:      Will turn SAK on
  3262: Debug:      Will set snd_wscale to 7
  3262: Debug:      Will set rcv_wscale to 7
  3262: Debug:      Will turn timestamps on
  3262: Debug: Will set mss clamp to 1460
     1: Error (criu/cr-restore.c:1298): 2496 exited, status=1
  1179: Error (criu/shmem.c:563): Can't restore shmem content
  1179: Error (criu/mem.c:1200): `- Can't open vma
Error (criu/cr-restore.c:2171): Restoring FAILED.
xemul commented 6 years ago

What's your criu version?

0x7f454c46 commented 6 years ago

It might be connected to this patch.

avagin commented 6 years ago

@0x7f454c46 sorry I missed this patch. The same problem was workarounded in 3.6 by another patch: commit f65517b95d9da8117d74065da3e3b4877f75731f Author: Andrei Vagin avagin@virtuozzo.com Date: Sat Oct 14 04:19:35 2017 +0300

shmem: dump shared memory before dumping namespaces
0x7f454c46 commented 6 years ago

@avagin don't worry, I've just pointed to the similar issue, looks like it's not the reason :)

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.