Open davidcohenm opened 4 years ago
running the same inside docker have different error but doesn't work as well :-(
$ docker version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:49 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:44:20 2020
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
$ cat /etc/criu/default.conf
tcp-established
ext-unix-sk
ghost-limit 4GB
$ cat /etc/criu/runc.conf
tcp-established
$ cat /etc/docker/daemon.json
{
"experimental": true
}
btw, what should I do to fix
clone3()
andtime namespaces
issue?
Don't worry about those messages. Your kernel is just old and does not support those interfaces which are not relevant for what you are trying to do.
If you run check without --all you should not see these warnings/errors.
I see an error about nested namespaces. On which plattform are you running your tests?
btw, what should I do to fix
clone3()
andtime namespaces
issue?Don't worry about those messages. Your kernel is just old and does not support those interfaces which are not relevant for what you are trying to do.
If you run check without --all you should not see these warnings/errors.
I see. I just tried latest kernel I could install on Ubuntu 16.04 LTS (that is v4.x) and also Ubuntu 20 (that is v5.x) and it has different errors but none of them works :-) I've also tried CentOs, Fedora, installed a lot of different distro's and no luck
I see an error about nested namespaces. On which plattform are you running your tests?
that's for REALLY quick replay, really appreciate this!
The nested error happens on Ubuntu without docker. Running the same inside docker gets an error with /no-such-path
something and each time I run it I see different result of find / -inum X
(and most of the time it's none).
$ uname -r
4.20.0-042000-generic
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
That's why I am wondering which versions you guys used when you filmed those / wrote wiki because it definitely works for you :)
If you meant where I am running this - I am using Windows 10 Machine with VMWare Workstation that runs inside Ubuntu 16.04 in this example (also has other VMs both Desktop and Server editions). I can try on AWS EC2
if you want.
If you can give me exact instructions of how do you run this and it works for you it will really help to understand what I am doing wrong (firefox version, kernel version, docker version, criu version, os distro version, hypervisor and so on).
btw, what should I do to fix
clone3()
andtime namespaces
issue?Don't worry about those messages. Your kernel is just old and does not support those interfaces which are not relevant for what you are trying to do.
If you run check without --all you should not see these warnings/errors.
btw, how do I know what is relevant and what is not for what I am doing ?
btw, I can't run recent firefox with sudo from my user so i did su -
before, maybe this relevant:
root 1097 0.0 0.0 52284 3476 ? S 08:32 0:00 su -
root 1102 0.0 0.1 22536 5344 ? S 08:32 0:00 \_ -su
root 1506 0.0 0.0 52700 3844 ? S 08:34 0:00 \_ sudo setsid unshare -i ./vnc.sh firefox
root 1507 0.0 0.0 12548 3004 ? Ss 08:34 0:00 \_ /bin/bash ./vnc.sh firefox
root 1509 6.2 0.3 43880 16032 ? S 08:34 10:34 \_ Xvnc :25 -v -geometry 648x375 -interface 0.0.0.0 -SecurityTypes none
root 1510 8.5 9.0 3048012 362732 ? Sl 08:34 14:35 \_ /usr/lib/firefox/firefox
root 1561 0.0 2.6 2562068 107084 ? Sl 08:34 0:03 \_ /usr/lib/firefox/firefox -contentproc -childID 1 -isForBrowser -pref
root 1582 0.0 3.3 2590940 133680 ? Sl 08:34 0:04 \_ /usr/lib/firefox/firefox -contentproc -childID 2 -isForBrowser -pref
root 1624 0.0 2.9 2569124 119296 ? Sl 08:34 0:02 \_ /usr/lib/firefox/firefox -contentproc -childID 3 -isForBrowser -pref
root 1671 13.9 21.1 3671408 848424 ? Sl 08:34 23:40 \_ /usr/lib/firefox/firefox -contentproc -childID 4 -isForBrowser -pref
root 1777 1.5 0.9 362476 38884 ? Sl 08:37 2:32 \_ /usr/lib/firefox/firefox -contentproc -parentBuildID 20200720193547
root 1953 0.0 1.8 2543916 73792 ? Sl 08:39 0:00 \_ /usr/lib/firefox/firefox -contentproc -childID 6 -isForBrowser -pref
root 1484 0.0 0.3 41296 13136 ? S 08:33 0:00 /usr/bin/Xvnc :1 -auth /root/.Xauthority -desktop ubuntu:1 (root) -fp /usr/share/fonts/X
sudo criu dump -t `pgrep vnc.sh` --tcp-established -D /tmp/test -j --ghost-limit=100M
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 1561
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
Tried the same scenario on AWS EC2 with Ubuntu 16:
$ sudo criu dump -t `pgrep vnc.sh` --tcp-established -D /tmp/test -j --ghost-limit=100M
Warn (criu/net.c:3137): Unable to get tun network namespace
Warn (criu/sk-unix.c:229): unix: Unable to open a socket file: Bad address
Warn (criu/net.c:3137): Unable to get socket network namespace
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 15141
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
$ uname -r
4.4.0-1110-aws
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
$ criu -V
Version: 3.14
$ criu check --all
Warn (criu/kerndat.c:829): Can't load /run/criu.kdat
Error (criu/util.c:618): exited, status=3
Error (criu/util.c:618): exited, status=3
Write 4294967295 to /proc/self/loginuid failed: Operation not permitted
Warn (criu/net.c:3137): Unable to get tun network namespace
Warn (criu/sk-unix.c:229): unix: Unable to open a socket file: Bad address
Error (criu/net.c:3423): Unable create a network namespace: Operation not permitted
Warn (criu/net.c:3475): NSID isn't reported for network links
Warn (criu/net.c:3137): Unable to get socket network namespace
Error (criu/util.c:696): You need to be root to run this command
$ sudo criu check --all
sudo: unable to resolve host ip-192-168-1-150
Warn (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Error (criu/cr-check.c:1155): The TCP_REPAIR_WINDOW option isn't supported.
Error (criu/cr-check.c:1099): TCP_REPAIR can't be enabled for half-closed sockets
Warn (criu/cr-check.c:1241): Do not have API to map vDSO - will use mremap() to restore vDSO
Error (criu/cr-check.c:1220): Non-cooperative UFFD is not supported
Warn (criu/cr-check.c:1230): clone3() with set_tid not supported
Error (criu/cr-check.c:1272): Time namespaces are not supported
Error (criu/cr-check.c:992): autofs not supported.
Warn (criu/cr-check.c:1197): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
$ sudo criu check
sudo: unable to resolve host ip-192-168-1-150
Warn (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Looks good.
$ docker version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:49 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:44:20 2020
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
upgraded kernel to:
$ uname -r
4.4.0-1112-aws
same error:
$ sudo criu dump -t `pgrep vnc.sh` --tcp-established -D /tmp/test -j --ghost-limit=100M
Warn (criu/net.c:3137): Unable to get tun network namespace
Warn (criu/sk-unix.c:229): unix: Unable to open a socket file: Bad address
Warn (criu/net.c:3137): Unable to get socket network namespace
Error (criu/namespaces.c:420): Can't dump nested ipc namespace for 1851
Error (criu/namespaces.c:672): Can't make ipcns id
Error (criu/cr-dump.c:1764): Dumping FAILED.
$ sudo criu check
Warn (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Looks good.
$ sudo criu check --all
Warn (criu/autofs.c:99): Failed to find pipe_ino option (old kernel?)
Error (criu/cr-check.c:1155): The TCP_REPAIR_WINDOW option isn't supported.
Error (criu/cr-check.c:1099): TCP_REPAIR can't be enabled for half-closed sockets
Warn (criu/cr-check.c:1241): Do not have API to map vDSO - will use mremap() to restore vDSO
Error (criu/cr-check.c:1220): Non-cooperative UFFD is not supported
Warn (criu/cr-check.c:1230): clone3() with set_tid not supported
Error (criu/cr-check.c:1272): Time namespaces are not supported
Error (criu/cr-check.c:992): autofs not supported.
Warn (criu/cr-check.c:1197): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
Just recently I was helping someone to get that setup working in this ticket: #1082
Please have a look at that ticket, that should give you a few hints how to do it.
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
Hey,
Thanks again for a quick response!
Just recently I was helping someone to get that setup working in this ticket: #1082
Please have a look at that ticket, that should give you a few hints how to do it.
I've quickly read whole thread, couldn't find anything that helps me. Do you suggest to try to use CentOS instead? Inside docker it doesn't work as well (have different error in later stage however).
setpid unshare -i ./vnc_server maya
this didn't work for me on Ubuntu 16:
$ setsid unshare -i ./vnc.sh firefox
$ unshare: unshare failed: Operation not permitted
with sudo
firefox is not running correctly:
Running Firefox as root in a regular user's session is not supported. ($HOME is /home/igorb which is owned by igorb.
with su -
it runs but doesn't dump :-(
have you tried recently to dump firefox
(or any other browser) in any env ? it does work with X, it doesn't work with "sophisticated" multi-process app that does IPC and tons of other stuff (a real world app with UI
).
Thanks!
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
as I understand newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to use sudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
Perfect. So maybe it is Firefox. Maybe Firefox uses nested IPC namespaces. Looking at the output of lsns
I see that Firefox uses lot of namespaces. CRIU cannot handle nested namespaces (someone needs to implement it). I never dump Firefox because it makes no sense for me (at least).
as I understand
newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to usesudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?
Not sure. As usual with documentation; at some point it is outdated. If unshare
works, that sounds easier than some extra process to handle that.
as I understand
newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to usesudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?Not sure. As usual with documentation; at some point it is outdated. If
unshare
works, that sounds easier than some extra process to handle that.
both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.
as I understand
newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to usesudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?Not sure. As usual with documentation; at some point it is outdated. If
unshare
works, that sounds easier than some extra process to handle that.both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.
Didn't you say it works?
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
Perfect. So maybe it is Firefox. Maybe Firefox uses nested IPC namespaces. Looking at the output of
lsns
I see that Firefox uses lot of namespaces. CRIU cannot handle nested namespaces (someone needs to implement it). I never dump Firefox because it makes no sense for me (at least).
same happens for chrome (doesn't work). in youtube video someone from your team dumps firefox as demo (there is also other video of red hat) so for someone it does work. perhaps you can check that ? in docker
it has different error:
(00.299954) fsnotify: [fhandle] bytes 0x000008 type 0x000001 __handle 0x0000000000df69:0000000000000000
(00.299955) fsnotify: Opening fhandle 2a:df69...
(00.299970) Warn (criu/fsnotify.c:288): fsnotify: Handle 0x2a:0xdf69 cannot be opened
..
(00.306160) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory
(00.306171) Error (criu/fsnotify.c:291): fsnotify: Can't dump that handle
(00.306237) ----------------------------------------
(00.306257) Error (criu/cr-dump.c:1349): Dump files (pid: 4436) failed with -1
There are tickets where someone mentioned that you an take a decimal number of handle (in my example: 0xdf69
) and do docker exec -it container bash
and then find / -inum <decimal_handle>
and find why it happens. This most of the time doesn't find anything and in other times each time find something else even when I rerun the exactly same docker over and over again. Someone else suggested to disable appharmor, I am running my docker with --security-opt seccomp:unconfined --security-opt apparmor=unconfined
and it still doesn't work.
as I understand
newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to usesudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?Not sure. As usual with documentation; at some point it is outdated. If
unshare
works, that sounds easier than some extra process to handle that.both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.
Didn't you say it works?
Unfortunately no. I am struggling with this for a while. Tried almost every permutation I could imagine (different os, different kernels, newns vs newsid, inside docker and outside, different app and so on) and have no luck. If I edit a source code of criu and skip the /no-such-path
error (and maybe by mistake others) then I got errors while I restore this...
So far, can't find any use case that works for me that's why I've opened a ticket here hoping if someone made a demo it worked for him so if I'll simulate same env as he had back then it will work for me (and then I can change each time only 1 thing and see what brakes it, kernel version, Xvnc version, app version, criu version, os distro and so on)
as I understand
newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to usesudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?Not sure. As usual with documentation; at some point it is outdated. If
unshare
works, that sounds easier than some extra process to handle that.both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.
Didn't you say it works?
What I meant is that it behaves differently (it will crash in other step) so I was wondering which of them recommended to use today (so I will check mostly one of them). I think inside docker container we get rid of this nested namespace thing but we still have /no-such-path
issue. Perhaps you are familiar with this?
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.
as I understand
newns.c
mentioned in wiki is an old way to do stuff, nowadays, you recommend instead to usesudo newsid unshare -i ./vnc.sh firefox
, right ? it should do the same?Not sure. As usual with documentation; at some point it is outdated. If
unshare
works, that sounds easier than some extra process to handle that.both of them doesn't work. just I have too many permutations to check so I am trying to narrow them down.
Didn't you say it works?
What I meant is that it behaves differently (it will crash in other step) so I was wondering which of them recommended to use today (so I will check mostly one of them). I think inside docker container we get rid of this nested namespace thing but we still have
/no-such-path
issue. Perhaps you are familiar with this?
No. With a container you will increase the nesting level. I do not see any value in using containers for your test.
Are you not able to reproduce the steps described in #1082 ? That was just a couple weeks ago that we had success checkpointing and restoring a VNC session.
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.
Correct. Other visual stuff worked (icewm
, xterm
and so on). Once I add a heavy multi-process app into ptree it stops to work. I think browser that didn't initialize right (like have some error and exiting, like popup for a user that something wrong) worked too. When it opens any page - it's game over. You can't checkpoint it.
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.
Correct. Other visual stuff worked (
icewm
,xterm
and so on). Once I add a heavy multi-process app into ptree it stops to work. I think browser that didn't initialize right (like have some error and exiting, like popup for a user that something wrong) worked too. When it opens any page - it's game over. You can't checkpoint it.
Thanks for making it clear that it works with a simple process but not with a browser. Maybe today's browsers are too complex for CRIU. You either need to fix CRIU to work with today's browsers or use a browser from 5 years ago if that is important for you setup. Unfortunately there is nothing we can do for you right now. I would say the graphical applications are not the most important use case for CRIU and that is why nobody is really looking into it.
Do you always have Firefox running inside your VNC session? Maybe try it with a simpler process. xterm or something like that. Something less complex than a browser.
less complex stuff work
I thought this comment meant, that less complex stuff works. Less complex stuff being something else than a browser.
Correct. Other visual stuff worked (
icewm
,xterm
and so on). Once I add a heavy multi-process app into ptree it stops to work. I think browser that didn't initialize right (like have some error and exiting, like popup for a user that something wrong) worked too. When it opens any page - it's game over. You can't checkpoint it.Thanks for making it clear that it works with a simple process but not with a browser. Maybe today's browsers are too complex for CRIU. You either need to fix CRIU to work with today's browsers or use a browser from 5 years ago if that is important for you setup. Unfortunately there is nothing we can do for you right now. I would say the graphical applications are not the most important use case for CRIU and that is why nobody is really looking into it.
really sad :-( perhaps you can spend a few minutes checking that on your machine (if it will work)?
really sad :-( perhaps you can spend a few minutes checking that on your machine (if it will work)?
Sorry, but not possible right now. Too far away from a useful system to try something like this.
really sad :-( perhaps you can spend a few minutes checking that on your machine (if it will work)?
Sorry, but not possible right now. Too far away from a useful system to try something like this.
ok, can we talk somewhere private? :-)
Handle 0x2a:0xdf69 cannot be opened
Probably your brouser has an inotify on the filesystem wich does not have fhandle support. You can try to lookup what is a device 0x2a (42) to check if it's the case.
Likely you would just grep " 0:42 " in mountinfo.
Handle 0x2a:0xdf69 cannot be opened
Probably your brouser has an inotify on the filesystem wich does not have fhandle support. You can try to lookup what is a device 0x2a (42) to check if it's the case.
Likely you would just grep " 0:42 " in mountinfo.
Thanks @Snorch!
I've just made a new run and I get this:
(00.291186) Dumping opened files (pid: 78432)
...
(00.291891) Warn (criu/fsnotify.c:288): fsnotify: Handle 0x3b:0x79575 cannot be opened
0x3b = 59
so I did on hostmachine (running app in docker):
cat /proc/78432/mountinfo | grep " 0:59"
302 216 0:59 / / rw,relatime master:90 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/UHL6TMV6HLXAWS4MB7V77EPRL4:/var/lib/docker/overlay2/l/U5OCH64E43UVQJC3SRBHUCHSEE:/var/lib/docker/overlay2/l/36I75CZHX37HOM2B3NASYT4OCK:/var/lib/docker/overlay2/l/VAYPFKRV5PMC2BDFFYN7MJ62OC:/var/lib/docker/overlay2/l/YW5NZX2NETV5EEFKZCCNU37CAA:/var/lib/docker/overlay2/l/62RAL3M6O5TO43IRZXVXFRQFKQ:/var/lib/docker/overlay2/l/2ZXEBMY4PCESV2WQSBQGGLSVLU:/var/lib/docker/overlay2/l/RY5TVZYLUOLJIMH5LUFVGEZA2N:/var/lib/docker/overlay2/l/C2ZBIAWSYIQMCPEKTC7EPWMENV:/var/lib/docker/overlay2/l/LQMMUT3FVFW5XPPA63Z3HIPYR7:/var/lib/docker/overlay2/l/7ELFV4ZFORXM2DJNLNS6Y77FOB:/var/lib/docker/overlay2/l/NTVSYJ2ULFMWXOKLSZRKYKQ5QS:/var/lib/docker/overlay2/l/4VR4DSQDL7YE4BB47MSVF7Q3AI:/var/lib/docker/overlay2/l/7YA3LOHDDUP7TYNZ6GSWHASVGW:/var/lib/docker/overlay2/l/Q2NLS7JTFKCLCB3OF7K4RVK5NP:/var/lib/docker/overlay2/l/VJUEYKBJ6PGUZNQCQJSW3YS7JV:/var/lib/docker/overlay2/l/5IQYJTYGACQMBCXY3M7KMJDSPD:/var/lib/docker/overlay2/l/SGOS7TDCDO775JPV7P4XIFBYII:/var/lib/docker/overlay2/l/L6SSOY276Z5DJTT4ZDOQP4K3ML:/var/lib/docker/overlay2/l/ALEEVRYORBDN54C73YBUWNTWV6:/var/lib/docker/overlay2/l/X3HU2YVOFSKYRWTKLPZ5OGB4CJ:/var/lib/docker/overlay2/l/DLV6RCBNOMKR76KAZUMDGIZGB3:/var/lib/docker/overlay2/l/IRNG7YJKMXRMB6R46GJCYQBIGI:/var/lib/docker/overlay2/l/5MUVEZ3RB5YKZKNWCCESSUXRHH:/var/lib/docker/overlay2/l/43MR75FS4NYTHQHDMHHNZ6YGNS:/var/lib/docker/overlay2/l/QLFCT2CJ5UGF4NCGBIZRQ6GPSA:/var/lib/docker/overlay2/l/BFPY7NPDWWPH74KFKIYOEXC2Y3:/var/lib/docker/overlay2/l/7N57J6KQOEVBEGK7YOHWDGMUPA:/var/lib/docker/overlay2/l/4QQP4HKUN2XJS3VGN23E5ZPMEK:/var/lib/docker/overlay2/l/YOFMFEKI3JMIX3EIMKWKPQCWB2:/var/lib/docker/overlay2/l/PEVYQBIN7Y3RQ5Q4OMVUGVCF2P:/var/lib/docker/overlay2/l/TIKZLT2EIMDYXVXIDJOTJOOHDT:/var/lib/docker/overlay2/l/HCGUFIMNUFVQ5JW7UVPJEYWOHT:/var/lib/docker/overlay2/l/RVFIKTXKHFWVDJ4FZJ55F63RXZ:/var/lib/docker/overlay2/l/QIPRK5N426PRX5YNHZVMTAM5N4:/var/lib/docker/overlay2/l/2BBWJJ7CZKYQHZK2PYSR67A5HS:/var/lib/docker/overlay2/l/PYQ2XZFRSJE4VM4G64I5SJCNSB:/var/lib/docker/overlay2/l/KTKE36MFG3V3X57NOYHJXG6ACR:/var/lib/docker/overlay2/l/UST4WJL6IBJY5OJNMWZ5ZBPGZV:/var/lib/docker/overlay2/l/A6YZMPTT6LTT2VSZREEPGD2DMT:/var/lib/docker/overlay2/l/HHRPJLAXD275JNUJQOG4YYTAI2:/var/lib/docker/overlay2/l/WH7NKGCTN4VV2OBME4SSWZVKX3:/var/lib/docker/overlay2/l/LAPD3CEBMDGZXTXTACN5FLVNN5:/var/lib/docker/overlay2/l/FUO4KX5WTGHLDRNE4JFB4VCW7D:/var/lib/docker/overlay2/l/CWY5DLRLTV26OWCM2IRPZHEKSC:/var/lib/docker/overlay2/l/IWLSZLYHIQL4HUXG2AUJSJXASP:/var/lib/docker/overlay2/l/IRV3OQC6662RVCNCM2PFD6MA5H:/var/lib/docker/overlay2/l/YVCXB3GYHVKGNIIQXBEKCZGQEJ:/var/lib/docker/overlay2/l/2SJFHCGA5IL66CHWDM2GRGLHPS:/var/lib/docker/overlay2/l/F7YCVYU4LUZ4VW66LUPQD5OI6E:/var/lib/docker/overlay2/l/7ZANRSNSA4NBRDYUJUV6FMVA2Q,upperdir=/var/lib/docker/overlay2/77fc58b937b6d9aaabbb16f7695ddbde02d3163ffbb1c099eb2bc7544f564450/diff,workdir=/var/lib/docker/overlay2/77fc58b937b6d9aaabbb16f7695ddbde02d3163ffbb1c099eb2bc7544f564450/work,xino=off
How should I proceed?
@davidcohenm So likely you have an inotify on overlayfs.
By default inotifies are not supported on overlayfs because overlayfs does not give proper fhandles in fdinfo. But there is a workaround, you need to enable several mount options:
Preserve hardlinks (index=on) (v4.13) NFS export (nfs_export=on) (v4.16)
Note these options can degrate overlayfs performance.
(upd: as you likely don't want to mess up with which options docker gives to it's mounts you should likely enable it as kernel boot option or overlay kernel module load option)
@davidcohenm So likely you have an inotify on overlayfs.
By default inotifies are not supported on overlayfs because overlayfs does not give proper fhandles in fdinfo. But there is a workaround, you need to enable several mount options:
Preserve hardlinks (index=on) (v4.13) NFS export (nfs_export=on) (v4.16)
Note these options can degrate overlayfs performance.
(upd: as you likely don't want to mess up with which options docker gives to it's mounts you should likely enable it as kernel boot option or overlay kernel module load option)
I will try, thanks!
P.S. Maybe it's easier to switch to another underlaying file system that docker support?
@davidcohenm So likely you have an inotify on overlayfs.
By default inotifies are not supported on overlayfs because overlayfs does not give proper fhandles in fdinfo. But there is a workaround, you need to enable several mount options:
Preserve hardlinks (index=on) (v4.13) NFS export (nfs_export=on) (v4.16)
Note these options can degrate overlayfs performance.
(upd: as you likely don't want to mess up with which options docker gives to it's mounts you should likely enable it as kernel boot option or overlay kernel module load option)
@Snorch thank you for such fast replay! really appreciate it.
I've tried to change files with nano
manually, after the change:
$ grep -H . /sys/module/overlay/parameters/*
/sys/module/overlay/parameters/check_copy_up:N
/sys/module/overlay/parameters/index:Y
/sys/module/overlay/parameters/metacopy:N
/sys/module/overlay/parameters/nfs_export:Y
/sys/module/overlay/parameters/redirect_always_follow:Y
/sys/module/overlay/parameters/redirect_dir:N
/sys/module/overlay/parameters/redirect_max:256
/sys/module/overlay/parameters/xino_auto:Y
(00.433047) irmap: Scanning /. hint
(00.433049) irmap: Refresh stat for /.
(00.433051) irmap: Scanning /no-such-path hint
(00.433052) irmap: Refresh stat for /no-such-path
(00.433122) Error (criu/irmap.c:86): irmap: Can't stat /no-such-path: No such file or directory
(00.433128) Error (criu/fsnotify.c:291): fsnotify: Can't dump that handle
(00.433173) ----------------------------------------
(00.433185) Error (criu/cr-dump.c:1348): Dump files (pid: 3026) failed with -1
where 3026
is a pid of my app inside docker (it's a pid of hostmachine).
It seems like docker doesn't pick up this setting?
255 215 0:59 / / rw,relatime master:90 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/U3LOCMJP4RKJM2XCSDRAO342VF:/var/lib/docker/overlay2/l/U5OCH64E43UVQJC3SRBHUCHSEE:/var/lib/docker/overlay2/l/36I75CZHX37HOM2B3NASYT4OCK:/var/lib/docker/overlay2/l/VAYPFKRV5PMC2BDFFYN7MJ62OC:/var/lib/docker/overlay2/l/YW5NZX2NETV5EEFKZCCNU37CAA:/var/lib/docker/overlay2/l/62RAL3M6O5TO43IRZXVXFRQFKQ:/var/lib/docker/overlay2/l/2ZXEBMY4PCESV2WQSBQGGLSVLU:/var/lib/docker/overlay2/l/RY5TVZYLUOLJIMH5LUFVGEZA2N:/var/lib/docker/overlay2/l/C2ZBIAWSYIQMCPEKTC7EPWMENV:/var/lib/docker/overlay2/l/LQMMUT3FVFW5XPPA63Z3HIPYR7:/var/lib/docker/overlay2/l/7ELFV4ZFORXM2DJNLNS6Y77FOB:/var/lib/docker/overlay2/l/NTVSYJ2ULFMWXOKLSZRKYKQ5QS:/var/lib/docker/overlay2/l/4VR4DSQDL7YE4BB47MSVF7Q3AI:/var/lib/docker/overlay2/l/7YA3LOHDDUP7TYNZ6GSWHASVGW:/var/lib/docker/overlay2/l/Q2NLS7JTFKCLCB3OF7K4RVK5NP:/var/lib/docker/overlay2/l/VJUEYKBJ6PGUZNQCQJSW3YS7JV:/var/lib/docker/overlay2/l/5IQYJTYGACQMBCXY3M7KMJDSPD:/var/lib/docker/overlay2/l/SGOS7TDCDO775JPV7P4XIFBYII:/var/lib/docker/overlay2/l/L6SSOY276Z5DJTT4ZDOQP4K3ML:/var/lib/docker/overlay2/l/ALEEVRYORBDN54C73YBUWNTWV6:/var/lib/docker/overlay2/l/X3HU2YVOFSKYRWTKLPZ5OGB4CJ:/var/lib/docker/overlay2/l/DLV6RCBNOMKR76KAZUMDGIZGB3:/var/lib/docker/overlay2/l/IRNG7YJKMXRMB6R46GJCYQBIGI:/var/lib/docker/overlay2/l/5MUVEZ3RB5YKZKNWCCESSUXRHH:/var/lib/docker/overlay2/l/43MR75FS4NYTHQHDMHHNZ6YGNS:/var/lib/docker/overlay2/l/QLFCT2CJ5UGF4NCGBIZRQ6GPSA:/var/lib/docker/overlay2/l/BFPY7NPDWWPH74KFKIYOEXC2Y3:/var/lib/docker/overlay2/l/7N57J6KQOEVBEGK7YOHWDGMUPA:/var/lib/docker/overlay2/l/4QQP4HKUN2XJS3VGN23E5ZPMEK:/var/lib/docker/overlay2/l/YOFMFEKI3JMIX3EIMKWKPQCWB2:/var/lib/docker/overlay2/l/PEVYQBIN7Y3RQ5Q4OMVUGVCF2P:/var/lib/docker/overlay2/l/TIKZLT2EIMDYXVXIDJOTJOOHDT:/var/lib/docker/overlay2/l/HCGUFIMNUFVQ5JW7UVPJEYWOHT:/var/lib/docker/overlay2/l/RVFIKTXKHFWVDJ4FZJ55F63RXZ:/var/lib/docker/overlay2/l/QIPRK5N426PRX5YNHZVMTAM5N4:/var/lib/docker/overlay2/l/2BBWJJ7CZKYQHZK2PYSR67A5HS:/var/lib/docker/overlay2/l/PYQ2XZFRSJE4VM4G64I5SJCNSB:/var/lib/docker/overlay2/l/KTKE36MFG3V3X57NOYHJXG6ACR:/var/lib/docker/overlay2/l/UST4WJL6IBJY5OJNMWZ5ZBPGZV:/var/lib/docker/overlay2/l/A6YZMPTT6LTT2VSZREEPGD2DMT:/var/lib/docker/overlay2/l/HHRPJLAXD275JNUJQOG4YYTAI2:/var/lib/docker/overlay2/l/WH7NKGCTN4VV2OBME4SSWZVKX3:/var/lib/docker/overlay2/l/LAPD3CEBMDGZXTXTACN5FLVNN5:/var/lib/docker/overlay2/l/FUO4KX5WTGHLDRNE4JFB4VCW7D:/var/lib/docker/overlay2/l/CWY5DLRLTV26OWCM2IRPZHEKSC:/var/lib/docker/overlay2/l/IWLSZLYHIQL4HUXG2AUJSJXASP:/var/lib/docker/overlay2/l/IRV3OQC6662RVCNCM2PFD6MA5H:/var/lib/docker/overlay2/l/YVCXB3GYHVKGNIIQXBEKCZGQEJ:/var/lib/docker/overlay2/l/2SJFHCGA5IL66CHWDM2GRGLHPS:/var/lib/docker/overlay2/l/F7YCVYU4LUZ4VW66LUPQD5OI6E:/var/lib/docker/overlay2/l/7ZANRSNSA4NBRDYUJUV6FMVA2Q,upperdir=/var/lib/docker/overlay2/7c2b84df34ab180d0c0d26be352937633ba7515b2f99b3744cdfdfdef546f0f2/diff,workdir=/var/lib/docker/overlay2/7c2b84df34ab180d0c0d26be352937633ba7515b2f99b3744cdfdfdef546f0f2/work,index=off,nfs_export=off,xino=off
See at the end index=off,nfs_export=off
?
Any idea?
btw, @Snorch can I get the path/filename that fails?
My only advise is that you play a bit more with these module options, probably setting them on boot. In Virtuozzo we can migrate basic inotifies on overlayfs in our tests.
btw, @Snorch can I get the path/filename that fails?
That's the tricky part, you can find inotify fd it should be several lines above the error in criu log. But you can't find the name of the file this inotify is monitoring without fhandle->fd->path resolution with open_by_handle_at syscall. But it does not work on overlay without those options.
My only advise is that you play a bit more with these module options, probably setting them on boot. In Virtuozzo we can migrate basic inotifies on overlayfs in our tests.
btw, @Snorch can I get the path/filename that fails?
That's the tricky part, you can find inotify fd it should be several lines above the error in criu log. But you can't find the name of the file this inotify is monitoring without fhandle->fd->path resolution with open_by_handle_at syscall. But it does not work on overlay without those options.
Perhaps you can point me for an example of how to change those boot options? I've found some patch for kernel build like this https://src.openvz.org/projects/OVZ/repos/vzkernel/browse/configs/kernel-3.10.0-x86_64-minimal.config but can't find a way to do that without rebuilding a kernel.
I am also trying to force docker to use for this specific container different storage params, no luck till now
Thanks again!
@Snorch:
In Virtuozzo we can migrate basic inotifies on overlayfs in our tests.
Can you elaborate a bit more about that? Where and how I can test it on my app ?
You can take a look on this test, if one changes internal overlay mount to external overlay mount in it it would probably also pass on mainstream criu.
You can take a look on this test, if one changes internal overlay mount to external overlay mount in it it would probably also pass on mainstream criu.
You can take a look on this test, if one changes internal overlay mount to external overlay mount in it it would probably also pass on mainstream criu.
@Snorch sorry, maybe I misuderstood you. What you are saying is that if I will install criu's fork of Virtuozzo it will probably work as is with my app (even without changing the kernel) ? Any suggestion for install procedure of that?
@davidcohenm No you missunderstand me. Virtuozzo version of criu is closely integrated into Virtuozzo kernel, and switching to it without installing full Virtuozzo/OpenVZ may be a hard thing to do and I don't advise you to do it.
What I say is: 1) We run tests on inotify on overlayfs, and they pass. 2) I hope only difference between Virtuozzo criu and mainstream criu in this area is that we support non-external overlayfs migration. But this difference should not be a problem for your case.
So you should be able to get it working right by just enabling the features of overlayfs I've mentioned above. We enable those features in kernel config, so maybe you have some strange behaviour because you want to enable them dynamically.
To sum-up I just say: that it should be possible to migrate inotify on overlayfs with mainstream criu.
(note: probably fixing inotify problem is not the only way, you can probably just switch to vfs graph driver in docker to remove overlay from equasion.)
A friendly reminder that this issue had no activity for 30 days.
Trying to dump in a container, I'm getting a similar error:
criu dump -t 9289 -D /checkpoint --tcp-established --leave-running
Warn (criu/kerndat.c:1593): CRIU was built without libnftables support
Warn (criu/kerndat.c:1243): Can't keep kdat cache on non-tempfs
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10659 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10662 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10682 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10684 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10686 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10687 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10689 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10690 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10699 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10709 with interrupted system call
Warn (compel/arch/x86/src/lib/infect.c:367): Will restore 10688 with interrupted system call
Warn (criu/fsnotify.c:281): fsnotify: Handle 0x278:0x2ffb5b cannot be opened
Warn (criu/irmap.c:104): irmap: Can't stat /no-such-path: No such file or directory
Error (criu/fsnotify.c:284): fsnotify: Can't dump that handle
Error (criu/cr-dump.c:1674): Dump files (pid: 10688) failed with -1
Error (criu/cr-dump.c:2098): Dumping FAILED.
find / -inum 3144539
/root/.config/glib-2.0/settings
Is there anything special with this director?
For context, Process was started with setsid
and unshare -i
. The tree includes a VNC server and a WebKit browser. procman
is a wrapper I wrote to make sure the entrypoint.sh
is started with setsid and new IPC namespace.
pstree 1
procman─┬─entrypoint.sh─┬─Xtigervnc
│ ├─node─┬─bash───MiniBrowser─┬─WebKitNetworkPr───5*[{WebKitNetworkPr}]
│ │ │ ├─WebKitWebProces───45*[{WebKitWebProces}]
│ │ │ └─27*[{MiniBrowser}]
│ │ └─10*[{node}]
│ └─websockify───websockify
└─6*[{procman}]
I've tried to enable nfs_export
and index
options in overlay2
but couldn't succeed and it seems docker overrides them anyway. @Snorch would it be fair to say there is no way other than turning those options on? I'm getting the error repeatedly on the same file, maybe I can somehow make criu ignore it? It'll be available in the FS of the restored container anyway
There is no way to support inotifies on overlayfs without overlayfs providing valid file handles.
maybe I can somehow make criu ignore it?
Even if it was possible to ignore it (which is not), you'll likely end up with deadlocked app which is waiting for notification from inotify which will never come.
https://forums.docker.com/t/nfs-export-disabled-with-overlay2-as-storage-driver/121325/4
Regarding this index=off override for docker mounts, I believe this override can be made optional, docker can live with index=on mounts (yes it becomes racy around container restart, but if we are not in a hurry and can wait a bit for overlay mount to fully dismantle, it should not be a big problem) @kolyshkin Any thoughts?
Even if it was possible to ignore it (which is not), you'll likely end up with deadlocked app which is waiting for notification from inotify which will never come.
Interesting, wonder how the official docker
commands handle these cases. I ended up removing the folder /root/.config/glib-2.0/settings
before taking checkpoint since it was an empty dir anyway and that made dump work. I'm guessing the app has built-in resiliency to handle missing dir.
Interesting, wonder how the official docker commands handle these cases.
They just don't handle these cases =) I don't know any other app except CRIU which might need restoring inotify watches.
Hi guys,
I wasn't able to reproduce demos based on VNC. The checkpoint part fails for me. I've tried several linux distributions and several kernels and no luck :-(
Perhaps someone can point me to his versions and ack that they still works?
Tried:
Plain looper (even between VM's) works for me. https://criu.org/Docker
My latest experiment was on
Ubuntu 16.04
: e.g.btw, what should I do to fix
clone3()
andtime namespaces
issue?