--lock-file doesn't wait for the lock, is this intentional?

containers / bubblewrap

Low-level unprivileged sandboxing tool used by Flatpak and similar projects

Other

3.97k stars 237 forks source link

--lock-file doesn't wait for the lock, is this intentional? #226

Open smcv opened 7 years ago

smcv commented 7 years ago

I intermittently see test-run.sh fail with:

Unable to lock file /var/tmp/tap-test.nlbsH3/lock: Resource temporarily unavailable

I believe this is a race condition between the lockf-n.py and bwrap children of test-run.sh. If the backgrounded shell process $childshellpid is unlucky, its bwrap child will try to take the lock while one of the lockf-n.py processes is holding it. bwrap doesn't wait for the lock (F_SETLKW), just does a fire-and-forget attempt to take it (F_SETLK), so this fails.

For the test, this can be fixed by running bwrap in a retry loop. However, the more I think about this, the more I wonder: is this really intentional? Should bwrap be using F_SETLKW for its --lock-file?

cgwalters commented 7 years ago

That does indeed seem like a bug; I'm not sure anything other than flatpak is using the lock file bits right now, and while I haven't looked, I suspect it may not matter since flatpak uses eventfds to synchronize with bwrap, so it won't race in this way. (But again I haven't looked closely)

/cc @alexlarsson for any comments.

cgwalters commented 7 years ago

One note here is I think we should really be using the modern locks; we have a cargo culted version from systemd in https://github.com/GNOME/libglnx/blob/master/glnx-lockfile.c

smcv commented 7 years ago

I think the way Flatpak uses this is that every user of a runtime or app takes a read lock, and Flatpak itself takes a write lock when it wants to delete the runtime or app? ... so the current behaviour might be more correct for Flatpak?

we should really be using the modern locks

If they're orthogonal to the currently-used locks, then that would be an ABI break. If a lock is part of your ABI, then the flavour of lock you take is sadly also part of your ABI. At least that's my understanding. (A new --ofd-lock-file option would be fine though)

smcv commented 7 years ago

I think I'm going to apply #227 in Debian experimental, since having the tests pass reliably is a really nice property to have (even if we're not 100% convinced they're testing the right things).

toralf commented 4 years ago

This is still an issue at a hardened Gentoo Linux:

$ sudo /opt/tb/bin/bwrap.sh /home/tinderbox/img1/17.1-libressl-20200409-111848
-rw-r--r-- 1 tinderbox tinderbox 0 May  9 21:04 /home/tinderbox/img1/17.1-libressl-20200409-111848/var/tmp/tb/LOCK
bwrap: Unable to open lock file /home/tinderbox/img1/17.1-libressl-20200409-111848/var/tmp/tb/LOCK: No such file or directory

FWIW here's the used code (origin is in https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh)

sandbox=(env -i
    PATH=/usr/sbin:/usr/bin:/sbin:/bin
    HOME=/root
    SHELL=/bin/bash
    TERM=linux
    /usr/bin/bwrap
    --bind "$mnt"                       /
    --bind /home/tinderbox/tb/data      /mnt/tb/data
    --bind /home/tinderbox/distfiles    /var/cache/distfiles
    --ro-bind /home/tinderbox/tb/sdata  /mnt/tb/sdata
    --ro-bind /var/db/repos             /mnt/repos
    --tmpfs                             /var/tmp/portage
    --tmpfs /dev/shm
    --dev /dev
    --proc /proc
    --mqueue /dev/mqueue
    --unshare-ipc
    --unshare-pid
    --unshare-uts
    --hostname "BWRAP-$(echo "${mnt##*/}" | sed -e 's,[+\.],_,g' | cut -c-57)"
    --chdir /
    --die-with-parent
    --lock-file $lock
     /bin/bash -l
)