checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.93k stars 588 forks source link

Can't reproduce your demos #1187

Open davidcohenm opened 4 years ago

davidcohenm commented 4 years ago

Hi guys,

I wasn't able to reproduce demos based on VNC. The checkpoint part fails for me. I've tried several linux distributions and several kernels and no luck :-(

Perhaps someone can point me to his versions and ack that they still works?

Tried:

  1. https://criu.org/VNC
  2. https://www.youtube.com/watch?v=kjhuzSl6JYc

Plain looper (even between VM's) works for me. https://criu.org/Docker

My latest experiment was on Ubuntu 16.04: e.g.

 sudo criu dump -t `pgrep vnc.sh` --tcp-established \
  -D /tmp/test -j --ghost-limit=100M
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 1561
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
$uname -r
4.20.0-042000-generic
$cat /etc/os-release
VERSION="16.04.7 LTS (Xenial Xerus)"
$runc --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
$criu --version
Version: 3.14
$sudo criu check --all
Warn  (criu/cr-check.c:1230): clone3() with set_tid not supported
Error (criu/cr-check.c:1272): Time namespaces are not supported
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

btw, what should I do to fix clone3() and time namespaces issue?

davidcohenm commented 4 years ago

running the same inside docker have different error but doesn't work as well :-(

$ docker version
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:49 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:20 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
$ cat /etc/criu/default.conf
tcp-established
ext-unix-sk
ghost-limit 4GB
$ cat /etc/criu/runc.conf
tcp-established
$ cat /etc/docker/daemon.json
{
  "experimental": true
}
adrianreber commented 4 years ago

btw, what should I do to fix clone3() and time namespaces issue?

Don't worry about those messages. Your kernel is just old and does not support those interfaces which are not relevant for what you are trying to do.

If you run check without --all you should not see these warnings/errors.

adrianreber commented 4 years ago

I see an error about nested namespaces. On which plattform are you running your tests?

davidcohenm commented 4 years ago

btw, what should I do to fix clone3() and time namespaces issue?

Don't worry about those messages. Your kernel is just old and does not support those interfaces which are not relevant for what you are trying to do.

If you run check without --all you should not see these warnings/errors.

I see. I just tried latest kernel I could install on Ubuntu 16.04 LTS (that is v4.x) and also Ubuntu 20 (that is v5.x) and it has different errors but none of them works :-) I've also tried CentOs, Fedora, installed a lot of different distro's and no luck

davidcohenm commented 4 years ago

I see an error about nested namespaces. On which plattform are you running your tests?

that's for REALLY quick replay, really appreciate this! The nested error happens on Ubuntu without docker. Running the same inside docker gets an error with /no-such-path something and each time I run it I see different result of find / -inum X (and most of the time it's none).

$ uname -r
4.20.0-042000-generic
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

That's why I am wondering which versions you guys used when you filmed those / wrote wiki because it definitely works for you :)

If you meant where I am running this - I am using Windows 10 Machine with VMWare Workstation that runs inside Ubuntu 16.04 in this example (also has other VMs both Desktop and Server editions). I can try on AWS EC2 if you want.

If you can give me exact instructions of how do you run this and it works for you it will really help to understand what I am doing wrong (firefox version, kernel version, docker version, criu version, os distro version, hypervisor and so on).

davidcohenm commented 4 years ago

btw, what should I do to fix clone3() and time namespaces issue?

Don't worry about those messages. Your kernel is just old and does not support those interfaces which are not relevant for what you are trying to do.

If you run check without --all you should not see these warnings/errors.

btw, how do I know what is relevant and what is not for what I am doing ?

davidcohenm commented 4 years ago

btw, I can't run recent firefox with sudo from my user so i did su - before, maybe this relevant:

root       1097  0.0  0.0  52284  3476 ?        S    08:32   0:00 su -
root       1102  0.0  0.1  22536  5344 ?        S    08:32   0:00  \_ -su
root       1506  0.0  0.0  52700  3844 ?        S    08:34   0:00      \_ sudo setsid unshare -i ./vnc.sh firefox
root       1507  0.0  0.0  12548  3004 ?        Ss   08:34   0:00          \_ /bin/bash ./vnc.sh firefox
root       1509  6.2  0.3  43880 16032 ?        S    08:34  10:34              \_ Xvnc :25 -v -geometry 648x375 -interface 0.0.0.0 -SecurityTypes none
root       1510  8.5  9.0 3048012 362732 ?      Sl   08:34  14:35              \_ /usr/lib/firefox/firefox
root       1561  0.0  2.6 2562068 107084 ?      Sl   08:34   0:03                  \_ /usr/lib/firefox/firefox -contentproc -childID 1 -isForBrowser -pref
root       1582  0.0  3.3 2590940 133680 ?      Sl   08:34   0:04                  \_ /usr/lib/firefox/firefox -contentproc -childID 2 -isForBrowser -pref
root       1624  0.0  2.9 2569124 119296 ?      Sl   08:34   0:02                  \_ /usr/lib/firefox/firefox -contentproc -childID 3 -isForBrowser -pref
root       1671 13.9 21.1 3671408 848424 ?      Sl   08:34  23:40                  \_ /usr/lib/firefox/firefox -contentproc -childID 4 -isForBrowser -pref
root       1777  1.5  0.9 362476 38884 ?        Sl   08:37   2:32                  \_ /usr/lib/firefox/firefox -contentproc -parentBuildID 20200720193547
root       1953  0.0  1.8 2543916 73792 ?       Sl   08:39   0:00                  \_ /usr/lib/firefox/firefox -contentproc -childID 6 -isForBrowser -pref
root       1484  0.0  0.3  41296 13136 ?        S    08:33   0:00 /usr/bin/Xvnc :1 -auth /root/.Xauthority -desktop ubuntu:1 (root) -fp /usr/share/fonts/X
sudo criu dump -t `pgrep vnc.sh` --tcp-established  -D /tmp/test -j --ghost-limit=100M
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 1561
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
davidcohenm commented 4 years ago

Tried the same scenario on AWS EC2 with Ubuntu 16:

$ sudo criu dump -t `pgrep vnc.sh` --tcp-established  -D /tmp/test -j --ghost-limit=100M
Warn  (criu/net.c:3137): Unable to get tun network namespace
Warn  (criu/sk-unix.c:229): unix: Unable to open a socket file: Bad address
Warn  (criu/net.c:3137): Unable to get socket network namespace
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 15141
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
$ uname -r
4.4.0-1110-aws
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
$ criu -V
Version: 3.14
$ criu check --all
Warn  (criu/kerndat.c:829): Can't load /run/criu.kdat
Error (criu/util.c:618): exited, status=3
Error (criu/util.c:618): exited, status=3
Write 4294967295 to /proc/self/loginuid failed: Operation not permitted
Warn  (criu/net.c:3137): Unable to get tun network namespace
Warn  (criu/sk-unix.c:229): unix: Unable to open a socket file: Bad address
Error (criu/net.c:3423): Unable create a network namespace: Operation not permitted
Warn  (criu/net.c:3475): NSID isn't reported for network links
Warn  (criu/net.c:3137): Unable to get socket network namespace
Error (criu/util.c:696): You need to be root to run this command
$ sudo criu check --all
sudo: unable to resolve host ip-192-168-1-150
Warn  (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Error (criu/cr-check.c:1155): The TCP_REPAIR_WINDOW option isn't supported.
Error (criu/cr-check.c:1099): TCP_REPAIR can't be enabled for half-closed sockets
Warn  (criu/cr-check.c:1241): Do not have API to map vDSO - will use mremap() to restore vDSO
Error (criu/cr-check.c:1220): Non-cooperative UFFD is not supported
Warn  (criu/cr-check.c:1230): clone3() with set_tid not supported
Error (criu/cr-check.c:1272): Time namespaces are not supported
Error (criu/cr-check.c:992): autofs not supported.
Warn  (criu/cr-check.c:1197): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
$ sudo criu check
sudo: unable to resolve host ip-192-168-1-150
Warn  (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Looks good.
$ docker version
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:49 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12

API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:20 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
davidcohenm commented 4 years ago

upgraded kernel to:

$ uname -r
4.4.0-1112-aws

same error:

$ sudo criu dump -t `pgrep vnc.sh` --tcp-established  -D /tmp/test -j --ghost-limit=100M
Warn  (criu/net.c:3137): Unable to get tun network namespace
Warn  (criu/sk-unix.c:229): unix: Unable to open a socket file: Bad address
Warn  (criu/net.c:3137): Unable to get socket network namespace
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 1851
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
$ sudo criu check
Warn  (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Looks good.
$ sudo criu check  --all
Warn  (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Error (criu/cr-check.c:1155): The TCP_REPAIR_WINDOW option isn't supported.
Error (criu/cr-check.c:1099): TCP_REPAIR can't be enabled for half-closed sockets
Warn  (criu/cr-check.c:1241): Do not have API to map vDSO - will use mremap() to restore vDSO
Error (criu/cr-check.c:1220): Non-cooperative UFFD is not supported
Warn  (criu/cr-check.c:1230): clone3() with set_tid not supported
Error (criu/cr-check.c:1272): Time namespaces are not supported
Error (criu/cr-check.c:992): autofs not supported.
Warn  (criu/cr-check.c:1197): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
adrianreber commented 4 years ago

Just recently I was helping someone to get that setup working in this ticket: #1082

Please have a look at that ticket, that should give you a few hints how to do it.

adrianreber commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

davidcohenm commented 4 years ago

Hey,

Thanks again for a quick response!

Just recently I was helping someone to get that setup working in this ticket: #1082

Please have a look at that ticket, that should give you a few hints how to do it.

I've quickly read whole thread, couldn't find anything that helps me. Do you suggest to try to use CentOS instead? Inside docker it doesn't work as well (have different error in later stage however).

setpid unshare -i ./vnc_server maya

this didn't work for me on Ubuntu 16:

$ setsid unshare -i ./vnc.sh firefox
$ unshare: unshare failed: Operation not permitted

with sudo firefox is not running correctly:

Running Firefox as root in a regular user's session is not supported.  ($HOME is /home/igorb which is owned by igorb.

with su - it runs but doesn't dump :-(

have you tried recently to dump firefox (or any other browser) in any env ? it does work with X, it doesn't work with "sophisticated" multi-process app that does IPC and tons of other stuff (a real world app with UI).

Thanks!

davidcohenm commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

davidcohenm commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

adrianreber commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

Perfect. So maybe it is Firefox. Maybe Firefox uses nested IPC namespaces. Looking at the output of lsns I see that Firefox uses lot of namespaces. CRIU cannot handle nested namespaces (someone needs to implement it). I never dump Firefox because it makes no sense for me (at least).

adrianreber commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

Not sure. As usual with documentation; at some point it is outdated. If unshare works, that sounds easier than some extra process to handle that.

davidcohenm commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

Not sure. As usual with documentation; at some point it is outdated. If unshare works, that sounds easier than some extra process to handle that.

both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.

adrianreber commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

Not sure. As usual with documentation; at some point it is outdated. If unshare works, that sounds easier than some extra process to handle that.

both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.

Didn't you say it works?

davidcohenm commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

Perfect. So maybe it is Firefox. Maybe Firefox uses nested IPC namespaces. Looking at the output of lsns I see that Firefox uses lot of namespaces. CRIU cannot handle nested namespaces (someone needs to implement it). I never dump Firefox because it makes no sense for me (at least).

same happens for chrome (doesn't work). in youtube video someone from your team dumps firefox as demo (there is also other video of red hat) so for someone it does work. perhaps you can check that ? in docker it has different error:

(00.299954) fsnotify:   [fhandle] bytes 0x000008 type 0x000001 __handle 0x0000000000df69:0000000000000000
(00.299955) fsnotify: Opening fhandle 2a:df69...
(00.299970) Warn  (criu/fsnotify.c:288): fsnotify:      Handle 0x2a:0xdf69 cannot be opened
..
(00.306160) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory
(00.306171) Error (criu/fsnotify.c:291): fsnotify:      Can't dump that handle
(00.306237) ----------------------------------------
(00.306257) Error (criu/cr-dump.c:1349): Dump files (pid: 4436) failed with -1

There are tickets where someone mentioned that you an take a decimal number of handle (in my example: 0xdf69) and do docker exec -it container bash and then find / -inum <decimal_handle> and find why it happens. This most of the time doesn't find anything and in other times each time find something else even when I rerun the exactly same docker over and over again. Someone else suggested to disable appharmor, I am running my docker with --security-opt seccomp:unconfined --security-opt apparmor=unconfined and it still doesn't work.

davidcohenm commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

Not sure. As usual with documentation; at some point it is outdated. If unshare works, that sounds easier than some extra process to handle that.

both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.

Didn't you say it works?

Unfortunately no. I am struggling with this for a while. Tried almost every permutation I could imagine (different os, different kernels, newns vs newsid, inside docker and outside, different app and so on) and have no luck. If I edit a source code of criu and skip the /no-such-path error (and maybe by mistake others) then I got errors while I restore this...

So far, can't find any use case that works for me that's why I've opened a ticket here hoping if someone made a demo it worked for him so if I'll simulate same env as he had back then it will work for me (and then I can change each time only 1 thing and see what brakes it, kernel version, Xvnc version, app version, criu version, os distro and so on)

davidcohenm commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

Not sure. As usual with documentation; at some point it is outdated. If unshare works, that sounds easier than some extra process to handle that.

both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.

Didn't you say it works?

What I meant is that it behaves differently (it will crash in other step) so I was wondering which of them recommended to use today (so I will check mostly one of them). I think inside docker container we get rid of this nested namespace thing but we still have /no-such-path issue. Perhaps you are familiar with this?

adrianreber commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.

adrianreber commented 4 years ago

as I understand newns.c mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox, right ? it should do the same?

Not sure. As usual with documentation; at some point it is outdated. If unshare works, that sounds easier than some extra process to handle that.

both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.

Didn't you say it works?

What I meant is that it behaves differently (it will crash in other step) so I was wondering which of them recommended to use today (so I will check mostly one of them). I think inside docker container we get rid of this nested namespace thing but we still have /no-such-path issue. Perhaps you are familiar with this?

No. With a container you will increase the nesting level. I do not see any value in using containers for your test.

Are you not able to reproduce the steps described in #1082 ? That was just a couple weeks ago that we had success checkpointing and restoring a VNC session.

davidcohenm commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.

Correct. Other visual stuff worked (icewm, xterm and so on). Once I add a heavy multi-process app into ptree it stops to work. I think browser that didn't initialize right (like have some error and exiting, like popup for a user that something wrong) worked too. When it opens any page - it's game over. You can't checkpoint it.

adrianreber commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.

Correct. Other visual stuff worked (icewm, xterm and so on). Once I add a heavy multi-process app into ptree it stops to work. I think browser that didn't initialize right (like have some error and exiting, like popup for a user that something wrong) worked too. When it opens any page - it's game over. You can't checkpoint it.

Thanks for making it clear that it works with a simple process but not with a browser. Maybe today's browsers are too complex for CRIU. You either need to fix CRIU to work with today's browsers or use a browser from 5 years ago if that is important for you setup. Unfortunately there is nothing we can do for you right now. I would say the graphical applications are not the most important use case for CRIU and that is why nobody is really looking into it.

davidcohenm commented 4 years ago

Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.

less complex stuff work

I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.

Correct. Other visual stuff worked (icewm, xterm and so on). Once I add a heavy multi-process app into ptree it stops to work. I think browser that didn't initialize right (like have some error and exiting, like popup for a user that something wrong) worked too. When it opens any page - it's game over. You can't checkpoint it.

Thanks for making it clear that it works with a simple process but not with a browser. Maybe today's browsers are too complex for CRIU. You either need to fix CRIU to work with today's browsers or use a browser from 5 years ago if that is important for you setup. Unfortunately there is nothing we can do for you right now. I would say the graphical applications are not the most important use case for CRIU and that is why nobody is really looking into it.

really sad :-( perhaps you can spend a few minutes checking that on your machine (if it will work)?

adrianreber commented 4 years ago

really sad :-( perhaps you can spend a few minutes checking that on your machine (if it will work)?

Sorry, but not possible right now. Too far away from a useful system to try something like this.

davidcohenm commented 4 years ago

really sad :-( perhaps you can spend a few minutes checking that on your machine (if it will work)?

Sorry, but not possible right now. Too far away from a useful system to try something like this.

ok, can we talk somewhere private? :-)

Snorch commented 4 years ago

Handle 0x2a:0xdf69 cannot be opened

Probably your brouser has an inotify on the filesystem wich does not have fhandle support. You can try to lookup what is a device 0x2a (42) to check if it's the case.

Likely you would just grep " 0:42 " in mountinfo.

davidcohenm commented 4 years ago

Handle 0x2a:0xdf69 cannot be opened

Probably your brouser has an inotify on the filesystem wich does not have fhandle support. You can try to lookup what is a device 0x2a (42) to check if it's the case.

Likely you would just grep " 0:42 " in mountinfo.

Thanks @Snorch!

I've just made a new run and I get this:

(00.291186) Dumping opened files (pid: 78432)
...
(00.291891) Warn  (criu/fsnotify.c:288): fsnotify:      Handle 0x3b:0x79575 cannot be opened

0x3b = 59 so I did on hostmachine (running app in docker):

cat /proc/78432/mountinfo  | grep " 0:59"
302 216 0:59 / / rw,relatime master:90 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/UHL6TMV6HLXAWS4MB7V77EPRL4:/var/lib/docker/overlay2/l/U5OCH64E43UVQJC3SRBHUCHSEE:/var/lib/docker/overlay2/l/36I75CZHX37HOM2B3NASYT4OCK:/var/lib/docker/overlay2/l/VAYPFKRV5PMC2BDFFYN7MJ62OC:/var/lib/docker/overlay2/l/YW5NZX2NETV5EEFKZCCNU37CAA:/var/lib/docker/overlay2/l/62RAL3M6O5TO43IRZXVXFRQFKQ:/var/lib/docker/overlay2/l/2ZXEBMY4PCESV2WQSBQGGLSVLU:/var/lib/docker/overlay2/l/RY5TVZYLUOLJIMH5LUFVGEZA2N:/var/lib/docker/overlay2/l/C2ZBIAWSYIQMCPEKTC7EPWMENV:/var/lib/docker/overlay2/l/LQMMUT3FVFW5XPPA63Z3HIPYR7:/var/lib/docker/overlay2/l/7ELFV4ZFORXM2DJNLNS6Y77FOB:/var/lib/docker/overlay2/l/NTVSYJ2ULFMWXOKLSZRKYKQ5QS:/var/lib/docker/overlay2/l/4VR4DSQDL7YE4BB47MSVF7Q3AI:/var/lib/docker/overlay2/l/7YA3LOHDDUP7TYNZ6GSWHASVGW:/var/lib/docker/overlay2/l/Q2NLS7JTFKCLCB3OF7K4RVK5NP:/var/lib/docker/overlay2/l/VJUEYKBJ6PGUZNQCQJSW3YS7JV:/var/lib/docker/overlay2/l/5IQYJTYGACQMBCXY3M7KMJDSPD:/var/lib/docker/overlay2/l/SGOS7TDCDO775JPV7P4XIFBYII:/var/lib/docker/overlay2/l/L6SSOY276Z5DJTT4ZDOQP4K3ML:/var/lib/docker/overlay2/l/ALEEVRYORBDN54C73YBUWNTWV6:/var/lib/docker/overlay2/l/X3HU2YVOFSKYRWTKLPZ5OGB4CJ:/var/lib/docker/overlay2/l/DLV6RCBNOMKR76KAZUMDGIZGB3:/var/lib/docker/overlay2/l/IRNG7YJKMXRMB6R46GJCYQBIGI:/var/lib/docker/overlay2/l/5MUVEZ3RB5YKZKNWCCESSUXRHH:/var/lib/docker/overlay2/l/43MR75FS4NYTHQHDMHHNZ6YGNS:/var/lib/docker/overlay2/l/QLFCT2CJ5UGF4NCGBIZRQ6GPSA:/var/lib/docker/overlay2/l/BFPY7NPDWWPH74KFKIYOEXC2Y3:/var/lib/docker/overlay2/l/7N57J6KQOEVBEGK7YOHWDGMUPA:/var/lib/docker/overlay2/l/4QQP4HKUN2XJS3VGN23E5ZPMEK:/var/lib/docker/overlay2/l/YOFMFEKI3JMIX3EIMKWKPQCWB2:/var/lib/docker/overlay2/l/PEVYQBIN7Y3RQ5Q4OMVUGVCF2P:/var/lib/docker/overlay2/l/TIKZLT2EIMDYXVXIDJOTJOOHDT:/var/lib/docker/overlay2/l/HCGUFIMNUFVQ5JW7UVPJEYWOHT:/var/lib/docker/overlay2/l/RVFIKTXKHFWVDJ4FZJ55F63RXZ:/var/lib/docker/overlay2/l/QIPRK5N426PRX5YNHZVMTAM5N4:/var/lib/docker/overlay2/l/2BBWJJ7CZKYQHZK2PYSR67A5HS:/var/lib/docker/overlay2/l/PYQ2XZFRSJE4VM4G64I5SJCNSB:/var/lib/docker/overlay2/l/KTKE36MFG3V3X57NOYHJXG6ACR:/var/lib/docker/overlay2/l/UST4WJL6IBJY5OJNMWZ5ZBPGZV:/var/lib/docker/overlay2/l/A6YZMPTT6LTT2VSZREEPGD2DMT:/var/lib/docker/overlay2/l/HHRPJLAXD275JNUJQOG4YYTAI2:/var/lib/docker/overlay2/l/WH7NKGCTN4VV2OBME4SSWZVKX3:/var/lib/docker/overlay2/l/LAPD3CEBMDGZXTXTACN5FLVNN5:/var/lib/docker/overlay2/l/FUO4KX5WTGHLDRNE4JFB4VCW7D:/var/lib/docker/overlay2/l/CWY5DLRLTV26OWCM2IRPZHEKSC:/var/lib/docker/overlay2/l/IWLSZLYHIQL4HUXG2AUJSJXASP:/var/lib/docker/overlay2/l/IRV3OQC6662RVCNCM2PFD6MA5H:/var/lib/docker/overlay2/l/YVCXB3GYHVKGNIIQXBEKCZGQEJ:/var/lib/docker/overlay2/l/2SJFHCGA5IL66CHWDM2GRGLHPS:/var/lib/docker/overlay2/l/F7YCVYU4LUZ4VW66LUPQD5OI6E:/var/lib/docker/overlay2/l/7ZANRSNSA4NBRDYUJUV6FMVA2Q,upperdir=/var/lib/docker/overlay2/77fc58b937b6d9aaabbb16f7695ddbde02d3163ffbb1c099eb2bc7544f564450/diff,workdir=/var/lib/docker/overlay2/77fc58b937b6d9aaabbb16f7695ddbde02d3163ffbb1c099eb2bc7544f564450/work,xino=off

How should I proceed?

Snorch commented 4 years ago

@davidcohenm So likely you have an inotify on overlayfs.

By default inotifies are not supported on overlayfs because overlayfs does not give proper fhandles in fdinfo. But there is a workaround, you need to enable several mount options:

Preserve hardlinks (index=on) (v4.13) NFS export (nfs_export=on) (v4.16)

Note these options can degrate overlayfs performance.

(upd: as you likely don't want to mess up with which options docker gives to it's mounts you should likely enable it as kernel boot option or overlay kernel module load option)

davidcohenm commented 4 years ago

@davidcohenm So likely you have an inotify on overlayfs.

By default inotifies are not supported on overlayfs because overlayfs does not give proper fhandles in fdinfo. But there is a workaround, you need to enable several mount options:

Preserve hardlinks (index=on) (v4.13) NFS export (nfs_export=on) (v4.16)

Note these options can degrate overlayfs performance.

(upd: as you likely don't want to mess up with which options docker gives to it's mounts you should likely enable it as kernel boot option or overlay kernel module load option)

I will try, thanks!

P.S. Maybe it's easier to switch to another underlaying file system that docker support?

davidcohenm commented 4 years ago

@davidcohenm So likely you have an inotify on overlayfs.

By default inotifies are not supported on overlayfs because overlayfs does not give proper fhandles in fdinfo. But there is a workaround, you need to enable several mount options:

Preserve hardlinks (index=on) (v4.13) NFS export (nfs_export=on) (v4.16)

Note these options can degrate overlayfs performance.

(upd: as you likely don't want to mess up with which options docker gives to it's mounts you should likely enable it as kernel boot option or overlay kernel module load option)

@Snorch thank you for such fast replay! really appreciate it.

I've tried to change files with nano manually, after the change:

$ grep -H . /sys/module/overlay/parameters/*
/sys/module/overlay/parameters/check_copy_up:N
/sys/module/overlay/parameters/index:Y
/sys/module/overlay/parameters/metacopy:N
/sys/module/overlay/parameters/nfs_export:Y
/sys/module/overlay/parameters/redirect_always_follow:Y
/sys/module/overlay/parameters/redirect_dir:N
/sys/module/overlay/parameters/redirect_max:256
/sys/module/overlay/parameters/xino_auto:Y
  1. If I restart whole OS - it doesn't preserve a change.
  2. If I don't restart a OS but restart docker service + my container I still get the same error:
    (00.433047) irmap: Scanning /. hint
    (00.433049) irmap: Refresh stat for /.
    (00.433051) irmap: Scanning /no-such-path hint
    (00.433052) irmap: Refresh stat for /no-such-path
    (00.433122) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory
    (00.433128) Error (criu/fsnotify.c:291): fsnotify:      Can't dump that handle
    (00.433173) ----------------------------------------
    (00.433185) Error (criu/cr-dump.c:1348): Dump files (pid: 3026) failed with -1

    where 3026 is a pid of my app inside docker (it's a pid of hostmachine).

It seems like docker doesn't pick up this setting?

255 215 0:59 / / rw,relatime master:90 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/U3LOCMJP4RKJM2XCSDRAO342VF:/var/lib/docker/overlay2/l/U5OCH64E43UVQJC3SRBHUCHSEE:/var/lib/docker/overlay2/l/36I75CZHX37HOM2B3NASYT4OCK:/var/lib/docker/overlay2/l/VAYPFKRV5PMC2BDFFYN7MJ62OC:/var/lib/docker/overlay2/l/YW5NZX2NETV5EEFKZCCNU37CAA:/var/lib/docker/overlay2/l/62RAL3M6O5TO43IRZXVXFRQFKQ:/var/lib/docker/overlay2/l/2ZXEBMY4PCESV2WQSBQGGLSVLU:/var/lib/docker/overlay2/l/RY5TVZYLUOLJIMH5LUFVGEZA2N:/var/lib/docker/overlay2/l/C2ZBIAWSYIQMCPEKTC7EPWMENV:/var/lib/docker/overlay2/l/LQMMUT3FVFW5XPPA63Z3HIPYR7:/var/lib/docker/overlay2/l/7ELFV4ZFORXM2DJNLNS6Y77FOB:/var/lib/docker/overlay2/l/NTVSYJ2ULFMWXOKLSZRKYKQ5QS:/var/lib/docker/overlay2/l/4VR4DSQDL7YE4BB47MSVF7Q3AI:/var/lib/docker/overlay2/l/7YA3LOHDDUP7TYNZ6GSWHASVGW:/var/lib/docker/overlay2/l/Q2NLS7JTFKCLCB3OF7K4RVK5NP:/var/lib/docker/overlay2/l/VJUEYKBJ6PGUZNQCQJSW3YS7JV:/var/lib/docker/overlay2/l/5IQYJTYGACQMBCXY3M7KMJDSPD:/var/lib/docker/overlay2/l/SGOS7TDCDO775JPV7P4XIFBYII:/var/lib/docker/overlay2/l/L6SSOY276Z5DJTT4ZDOQP4K3ML:/var/lib/docker/overlay2/l/ALEEVRYORBDN54C73YBUWNTWV6:/var/lib/docker/overlay2/l/X3HU2YVOFSKYRWTKLPZ5OGB4CJ:/var/lib/docker/overlay2/l/DLV6RCBNOMKR76KAZUMDGIZGB3:/var/lib/docker/overlay2/l/IRNG7YJKMXRMB6R46GJCYQBIGI:/var/lib/docker/overlay2/l/5MUVEZ3RB5YKZKNWCCESSUXRHH:/var/lib/docker/overlay2/l/43MR75FS4NYTHQHDMHHNZ6YGNS:/var/lib/docker/overlay2/l/QLFCT2CJ5UGF4NCGBIZRQ6GPSA:/var/lib/docker/overlay2/l/BFPY7NPDWWPH74KFKIYOEXC2Y3:/var/lib/docker/overlay2/l/7N57J6KQOEVBEGK7YOHWDGMUPA:/var/lib/docker/overlay2/l/4QQP4HKUN2XJS3VGN23E5ZPMEK:/var/lib/docker/overlay2/l/YOFMFEKI3JMIX3EIMKWKPQCWB2:/var/lib/docker/overlay2/l/PEVYQBIN7Y3RQ5Q4OMVUGVCF2P:/var/lib/docker/overlay2/l/TIKZLT2EIMDYXVXIDJOTJOOHDT:/var/lib/docker/overlay2/l/HCGUFIMNUFVQ5JW7UVPJEYWOHT:/var/lib/docker/overlay2/l/RVFIKTXKHFWVDJ4FZJ55F63RXZ:/var/lib/docker/overlay2/l/QIPRK5N426PRX5YNHZVMTAM5N4:/var/lib/docker/overlay2/l/2BBWJJ7CZKYQHZK2PYSR67A5HS:/var/lib/docker/overlay2/l/PYQ2XZFRSJE4VM4G64I5SJCNSB:/var/lib/docker/overlay2/l/KTKE36MFG3V3X57NOYHJXG6ACR:/var/lib/docker/overlay2/l/UST4WJL6IBJY5OJNMWZ5ZBPGZV:/var/lib/docker/overlay2/l/A6YZMPTT6LTT2VSZREEPGD2DMT:/var/lib/docker/overlay2/l/HHRPJLAXD275JNUJQOG4YYTAI2:/var/lib/docker/overlay2/l/WH7NKGCTN4VV2OBME4SSWZVKX3:/var/lib/docker/overlay2/l/LAPD3CEBMDGZXTXTACN5FLVNN5:/var/lib/docker/overlay2/l/FUO4KX5WTGHLDRNE4JFB4VCW7D:/var/lib/docker/overlay2/l/CWY5DLRLTV26OWCM2IRPZHEKSC:/var/lib/docker/overlay2/l/IWLSZLYHIQL4HUXG2AUJSJXASP:/var/lib/docker/overlay2/l/IRV3OQC6662RVCNCM2PFD6MA5H:/var/lib/docker/overlay2/l/YVCXB3GYHVKGNIIQXBEKCZGQEJ:/var/lib/docker/overlay2/l/2SJFHCGA5IL66CHWDM2GRGLHPS:/var/lib/docker/overlay2/l/F7YCVYU4LUZ4VW66LUPQD5OI6E:/var/lib/docker/overlay2/l/7ZANRSNSA4NBRDYUJUV6FMVA2Q,upperdir=/var/lib/docker/overlay2/7c2b84df34ab180d0c0d26be352937633ba7515b2f99b3744cdfdfdef546f0f2/diff,workdir=/var/lib/docker/overlay2/7c2b84df34ab180d0c0d26be352937633ba7515b2f99b3744cdfdfdef546f0f2/work,index=off,nfs_export=off,xino=off

See at the end index=off,nfs_export=off ?

Any idea?

davidcohenm commented 4 years ago

btw, @Snorch can I get the path/filename that fails?

Snorch commented 4 years ago

My only advise is that you play a bit more with these module options, probably setting them on boot. In Virtuozzo we can migrate basic inotifies on overlayfs in our tests.

btw, @Snorch can I get the path/filename that fails?

That's the tricky part, you can find inotify fd it should be several lines above the error in criu log. But you can't find the name of the file this inotify is monitoring without fhandle->fd->path resolution with open_by_handle_at syscall. But it does not work on overlay without those options.

davidcohenm commented 4 years ago

My only advise is that you play a bit more with these module options, probably setting them on boot. In Virtuozzo we can migrate basic inotifies on overlayfs in our tests.

btw, @Snorch can I get the path/filename that fails?

That's the tricky part, you can find inotify fd it should be several lines above the error in criu log. But you can't find the name of the file this inotify is monitoring without fhandle->fd->path resolution with open_by_handle_at syscall. But it does not work on overlay without those options.

Perhaps you can point me for an example of how to change those boot options? I've found some patch for kernel build like this https://src.openvz.org/projects/OVZ/repos/vzkernel/browse/configs/kernel-3.10.0-x86_64-minimal.config but can't find a way to do that without rebuilding a kernel.

I am also trying to force docker to use for this specific container different storage params, no luck till now

Thanks again!

davidcohenm commented 4 years ago

@Snorch:

In Virtuozzo we can migrate basic inotifies on overlayfs in our tests.

Can you elaborate a bit more about that? Where and how I can test it on my app ?

Snorch commented 4 years ago

@davidcohenm https://src.openvz.org/projects/OVZ/repos/criu/commits/3dae0f51f26de25f3c4a7c29f1f00a63f5695272#test/zdtm/static/overlayfs_fanotify.c

You can take a look on this test, if one changes internal overlay mount to external overlay mount in it it would probably also pass on mainstream criu.

davidcohenm commented 4 years ago

@davidcohenm https://src.openvz.org/projects/OVZ/repos/criu/commits/3dae0f51f26de25f3c4a7c29f1f00a63f5695272#test/zdtm/static/overlayfs_fanotify.c

You can take a look on this test, if one changes internal overlay mount to external overlay mount in it it would probably also pass on mainstream criu.

You can take a look on this test, if one changes internal overlay mount to external overlay mount in it it would probably also pass on mainstream criu.

davidcohenm commented 4 years ago

@Snorch sorry, maybe I misuderstood you. What you are saying is that if I will install criu's fork of Virtuozzo it will probably work as is with my app (even without changing the kernel) ? Any suggestion for install procedure of that?

Snorch commented 4 years ago

@davidcohenm No you missunderstand me. Virtuozzo version of criu is closely integrated into Virtuozzo kernel, and switching to it without installing full Virtuozzo/OpenVZ may be a hard thing to do and I don't advise you to do it.

What I say is: 1) We run tests on inotify on overlayfs, and they pass. 2) I hope only difference between Virtuozzo criu and mainstream criu in this area is that we support non-external overlayfs migration. But this difference should not be a problem for your case.

So you should be able to get it working right by just enabling the features of overlayfs I've mentioned above. We enable those features in kernel config, so maybe you have some strange behaviour because you want to enable them dynamically.

To sum-up I just say: that it should be possible to migrate inotify on overlayfs with mainstream criu.

(note: probably fixing inotify problem is not the only way, you can probably just switch to vfs graph driver in docker to remove overlay from equasion.)

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

muvaf commented 7 months ago

Trying to dump in a container, I'm getting a similar error:

criu dump -t 9289 -D /checkpoint --tcp-established --leave-running
Warn  (criu/kerndat.c:1593): CRIU was built without libnftables support
Warn  (criu/kerndat.c:1243): Can't keep kdat cache on non-tempfs
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10659 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10662 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10682 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10684 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10686 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10687 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10689 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10690 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10699 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10709 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:367): Will restore 10688 with interrupted system call
Warn  (criu/fsnotify.c:281): fsnotify:  Handle 0x278:0x2ffb5b cannot be opened
Warn  (criu/irmap.c:104): irmap: Can't stat /no-such-path: No such file or directory
Error (criu/fsnotify.c:284): fsnotify:  Can't dump that handle
Error (criu/cr-dump.c:1674): Dump files (pid: 10688) failed with -1
Error (criu/cr-dump.c:2098): Dumping FAILED.
find / -inum 3144539
/root/.config/glib-2.0/settings

Is there anything special with this director?

For context, Process was started with setsid and unshare -i. The tree includes a VNC server and a WebKit browser. procman is a wrapper I wrote to make sure the entrypoint.sh is started with setsid and new IPC namespace.

pstree 1
procman─┬─entrypoint.sh─┬─Xtigervnc
        │               ├─node─┬─bash───MiniBrowser─┬─WebKitNetworkPr───5*[{WebKitNetworkPr}]
        │               │      │                    ├─WebKitWebProces───45*[{WebKitWebProces}]
        │               │      │                    └─27*[{MiniBrowser}]
        │               │      └─10*[{node}]
        │               └─websockify───websockify
        └─6*[{procman}]

I've tried to enable nfs_export and index options in overlay2 but couldn't succeed and it seems docker overrides them anyway. @Snorch would it be fair to say there is no way other than turning those options on? I'm getting the error repeatedly on the same file, maybe I can somehow make criu ignore it? It'll be available in the FS of the restored container anyway

Snorch commented 7 months ago

There is no way to support inotifies on overlayfs without overlayfs providing valid file handles.

maybe I can somehow make criu ignore it?

Even if it was possible to ignore it (which is not), you'll likely end up with deadlocked app which is waiting for notification from inotify which will never come.

https://forums.docker.com/t/nfs-export-disabled-with-overlay2-as-storage-driver/121325/4

Regarding this index=off override for docker mounts, I believe this override can be made optional, docker can live with index=on mounts (yes it becomes racy around container restart, but if we are not in a hurry and can wait a bit for overlay mount to fully dismantle, it should not be a big problem) @kolyshkin Any thoughts?

muvaf commented 7 months ago

Even if it was possible to ignore it (which is not), you'll likely end up with deadlocked app which is waiting for notification from inotify which will never come.

Interesting, wonder how the official docker commands handle these cases. I ended up removing the folder /root/.config/glib-2.0/settings before taking checkpoint since it was an empty dir anyway and that made dump work. I'm guessing the app has built-in resiliency to handle missing dir.

Snorch commented 7 months ago

Interesting, wonder how the official docker commands handle these cases.

They just don't handle these cases =) I don't know any other app except CRIU which might need restoring inotify watches.

muvaf commented 7 months ago

I don't know any other app except CRIU which might need restoring inotify watches.

Docker uses criu under the hood for checkpoint and restore machinery, I guess it's not a frequently used feature that they needed to prioritize the problem yet.