Closed blubberdiblub closed 8 years ago
I cannot reproduce error on my virtual machine:
$ uname -a
Linux ubuntu 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/issue
Ubuntu 16.04.1 LTS \n \l
ETXTBUSY
for execve
means that at the time of system call the file is opened for writing. Can I ask you what filesystem do you use?
Well, the host filesystem is ext4. On top of that, Docker puts some union-style filesystem, so that it can layer multiple images on top of each other. I believe it's currently using aufs for that purpose.
nboehm@eudora:~$ uname -a
Linux eudora 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
nboehm@eudora:~$ cat /proc/self/mountinfo | egrep srv
101 25 252:2 / /srv/storage rw,noatime shared:82 - ext4 /dev/mapper/vg0-storage rw,data=ordered
246 101 252:2 /docker/aufs /srv/storage/docker/aufs rw,noatime - ext4 /dev/mapper/vg0-storage rw,data=ordered
189 246 0:43 / /srv/storage/docker/aufs/mnt/72dada415dea3c835c4e639bf4537a5696c0df00413004be3fc37588981242e1 rw,relatime - aufs none rw,si=ee0793a95163ca1b,dio,dirperm1
190 101 0:45 / /srv/storage/docker/containers/b04bb5a3508cc6116242758c452318562a20753c0de5d7453f1eb4e9852ea513/shm rw,nosuid,nodev,noexec,relatime shared:165 - tmpfs shm rw,size=65536k
Here's information about the partition I use for build process:
$ cat /proc/self/mountinfo | grep sda1
24 0 8:1 / / rw,relatime shared:1 - ext4 /dev/sda1 rw,errors=remount-ro,data=ordered
Can you remount your filesystem with relatime
instead of noatime
and check for build failures?
I rebooted with the storage on relatime now. Unfortunately, the build failures still happen.
nboehm@eudora:~$ cat /proc/self/mountinfo | egrep srv
103 25 252:2 / /srv/storage rw,relatime shared:84 - ext4 /dev/mapper/vg0-storage rw,data=ordered
251 103 252:2 /docker/aufs /srv/storage/docker/aufs rw,relatime - ext4 /dev/mapper/vg0-storage rw,data=ordered
201 251 0:52 / /srv/storage/docker/aufs/mnt/9a59b23dede2b7039ca0ae5046ec93a214ae96ff859d3a64e57704f018ae4acc rw,relatime - aufs none rw,si=f80a43115308217c,dio,dirperm1
202 103 0:53 / /srv/storage/docker/containers/a9d54f9d29dd80294fa386d5d0c9a3412c4e02d2715177203bc6dd12b73fe915/shm rw,nosuid,nodev,noexec,relatime shared:179 - tmpfs shm rw,size=65536k
I must admit - this issue is extremely mysterious. The culprit is not really my script but something sinister below in the software stack.
Could you try replacing touch(filename)
with execute('touch', filename)
and see if it helps?
Unfortunately, that didn't help.
Maybe it has something to do with building inside a Docker container?
When I find the time, I will try to set up a real VM that's as close as possible to the Docker container and see if I can reproduce it.
It's likely that Docker is a culprit here. It's undeniable that my script doesn't keep submodules/libdebug/configure
opened for writing at the moment when execve
is called. We ruled out touch
function. unarc
seems to be fine as it closes file after write (look here). As a last resort we could start debugging issue at system call level with strace
.
I cannot reproduce the error with docker on travis-ci. Find detailed information about docker version below:
docker version
Client:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 22:00:36 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.0
API version: 1.24
Go version: go1.6.3
Git commit: 8eab29e
Built: Thu Jul 28 22:00:36 2016
OS/Arch: linux/amd64
When I build one of the recent commits (from a cleanly checked out repository), the first build attempt usually fails with the following messages, although not always (race condition?):
When I run the build command a second time, the build goes through.
I can reproduce the problem since commit a008957917babeb66069bd39c69d853b205ce6ca, but wasn't able to reproduce it for commit 4ca4a901972404c1bb96550bf2bf0871e4d9c098, so it might have been introduced in the former.
My build environment is an Ubuntu 16.04 Xenial Xerus with gcc 5.4 inside a Docker container (I'm unable to build it on my host system, as I cannot install the gcc:i386 without package conflicts).
As I have my Docker images on the hub, you can very easily reproduce my environment if you have Docker running:
The one that works:
The one that fails most of the time:
If the build completes without error, the container just cleans itself up (
--rm
). If the build fails with an error, however, it will spawn a bash for investigation. Afterexit
, the container is cleaned up as well.