Open smcv opened 7 years ago
That does indeed seem like a bug; I'm not sure anything other than flatpak is using the lock file bits right now, and while I haven't looked, I suspect it may not matter since flatpak uses eventfds to synchronize with bwrap, so it won't race in this way. (But again I haven't looked closely)
/cc @alexlarsson for any comments.
One note here is I think we should really be using the modern locks; we have a cargo culted version from systemd in https://github.com/GNOME/libglnx/blob/master/glnx-lockfile.c
I think the way Flatpak uses this is that every user of a runtime or app takes a read lock, and Flatpak itself takes a write lock when it wants to delete the runtime or app? ... so the current behaviour might be more correct for Flatpak?
we should really be using the modern locks
If they're orthogonal to the currently-used locks, then that would be an ABI break. If a lock is part of your ABI, then the flavour of lock you take is sadly also part of your ABI. At least that's my understanding. (A new --ofd-lock-file
option would be fine though)
I think I'm going to apply #227 in Debian experimental, since having the tests pass reliably is a really nice property to have (even if we're not 100% convinced they're testing the right things).
This is still an issue at a hardened Gentoo Linux:
$ sudo /opt/tb/bin/bwrap.sh /home/tinderbox/img1/17.1-libressl-20200409-111848
-rw-r--r-- 1 tinderbox tinderbox 0 May 9 21:04 /home/tinderbox/img1/17.1-libressl-20200409-111848/var/tmp/tb/LOCK
bwrap: Unable to open lock file /home/tinderbox/img1/17.1-libressl-20200409-111848/var/tmp/tb/LOCK: No such file or directory
FWIW here's the used code (origin is in https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh)
sandbox=(env -i
PATH=/usr/sbin:/usr/bin:/sbin:/bin
HOME=/root
SHELL=/bin/bash
TERM=linux
/usr/bin/bwrap
--bind "$mnt" /
--bind /home/tinderbox/tb/data /mnt/tb/data
--bind /home/tinderbox/distfiles /var/cache/distfiles
--ro-bind /home/tinderbox/tb/sdata /mnt/tb/sdata
--ro-bind /var/db/repos /mnt/repos
--tmpfs /var/tmp/portage
--tmpfs /dev/shm
--dev /dev
--proc /proc
--mqueue /dev/mqueue
--unshare-ipc
--unshare-pid
--unshare-uts
--hostname "BWRAP-$(echo "${mnt##*/}" | sed -e 's,[+\.],_,g' | cut -c-57)"
--chdir /
--die-with-parent
--lock-file $lock
/bin/bash -l
)
I intermittently see
test-run.sh
fail with:I believe this is a race condition between the
lockf-n.py
andbwrap
children oftest-run.sh
. If the backgrounded shell process$childshellpid
is unlucky, itsbwrap
child will try to take the lock while one of thelockf-n.py
processes is holding it.bwrap
doesn't wait for the lock (F_SETLKW
), just does a fire-and-forget attempt to take it (F_SETLK
), so this fails.For the test, this can be fixed by running
bwrap
in a retry loop. However, the more I think about this, the more I wonder: is this really intentional? Shouldbwrap
be usingF_SETLKW
for its--lock-file
?