adtools / amigaos-cross-toolchain

AmigaOS cross compiler for Linux / MacOSX / Windows
184 stars 48 forks source link

Docker: first build attempt fails most of the time #51

Closed blubberdiblub closed 8 years ago

blubberdiblub commented 8 years ago

When I build one of the recent commits (from a cleanly checked out repository), the first build attempt usually fails with the following messages, although not always (race condition?):

DEBUG: execute "/home/build/amigaos-cross-toolchain/submodules/libdebug/configure --prefix=/opt/m68k-amigaos/m68k-amigaos/libnix --host=m68k-amigaos"
DEBUG: restoring old value of environment variable "CC"
DEBUG: restoring old value of environment variable "AR"
DEBUG: restoring old value of environment variable "RANLIB"
Traceback (most recent call last):
  File "./toolchain-m68k", line 714, in <module>
    globals()[action].__call__(*args.args)
  File "./toolchain-m68k", line 446, in build
    from_dir='{submodules}/{libdebug}')
  File "/home/build/amigaos-cross-toolchain/common.py", line 37, in wrapper
    return fn(*args, **kwargs)
  File "/home/build/amigaos-cross-toolchain/common.py", line 355, in wrapper
    fn(*args, **kwargs)
  File "/home/build/amigaos-cross-toolchain/common.py", line 451, in configure
    execute(path.join(from_dir, 'configure'), *confopts)
  File "/home/build/amigaos-cross-toolchain/common.py", line 37, in wrapper
    return fn(*args, **kwargs)
  File "/home/build/amigaos-cross-toolchain/common.py", line 216, in execute
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 536, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 523, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 26] Text file busy

When I run the build command a second time, the build goes through.

I can reproduce the problem since commit a008957917babeb66069bd39c69d853b205ce6ca, but wasn't able to reproduce it for commit 4ca4a901972404c1bb96550bf2bf0871e4d9c098, so it might have been introduced in the former.

My build environment is an Ubuntu 16.04 Xenial Xerus with gcc 5.4 inside a Docker container (I'm unable to build it on my host system, as I cannot install the gcc:i386 without package conflicts).

As I have my Docker images on the hub, you can very easily reproduce my environment if you have Docker running:

The one that works:

docker run -it --rm blubberdiblub/amigaos-cross-toolchain-repository:4ca4a90 sh -c './toolchain-m68k --prefix=/opt/m68k-amigaos build || bash'

The one that fails most of the time:

docker run -it --rm blubberdiblub/amigaos-cross-toolchain-repository:a008957 sh -c './toolchain-m68k --prefix=/opt/m68k-amigaos build || bash'

If the build completes without error, the container just cleans itself up (--rm). If the build fails with an error, however, it will spawn a bash for investigation. After exit, the container is cleaned up as well.

cahirwpz commented 8 years ago

I cannot reproduce error on my virtual machine:

$ uname -a
Linux ubuntu 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/issue
Ubuntu 16.04.1 LTS \n \l

ETXTBUSY for execve means that at the time of system call the file is opened for writing. Can I ask you what filesystem do you use?

blubberdiblub commented 8 years ago

Well, the host filesystem is ext4. On top of that, Docker puts some union-style filesystem, so that it can layer multiple images on top of each other. I believe it's currently using aufs for that purpose.

nboehm@eudora:~$ uname -a
Linux eudora 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
nboehm@eudora:~$ cat /proc/self/mountinfo | egrep srv
101 25 252:2 / /srv/storage rw,noatime shared:82 - ext4 /dev/mapper/vg0-storage rw,data=ordered
246 101 252:2 /docker/aufs /srv/storage/docker/aufs rw,noatime - ext4 /dev/mapper/vg0-storage rw,data=ordered
189 246 0:43 / /srv/storage/docker/aufs/mnt/72dada415dea3c835c4e639bf4537a5696c0df00413004be3fc37588981242e1 rw,relatime - aufs none rw,si=ee0793a95163ca1b,dio,dirperm1
190 101 0:45 / /srv/storage/docker/containers/b04bb5a3508cc6116242758c452318562a20753c0de5d7453f1eb4e9852ea513/shm rw,nosuid,nodev,noexec,relatime shared:165 - tmpfs shm rw,size=65536k
cahirwpz commented 8 years ago

Here's information about the partition I use for build process:

$ cat /proc/self/mountinfo | grep sda1
24 0 8:1 / / rw,relatime shared:1 - ext4 /dev/sda1 rw,errors=remount-ro,data=ordered

Can you remount your filesystem with relatime instead of noatime and check for build failures?

blubberdiblub commented 8 years ago

I rebooted with the storage on relatime now. Unfortunately, the build failures still happen.

nboehm@eudora:~$ cat /proc/self/mountinfo | egrep srv
103 25 252:2 / /srv/storage rw,relatime shared:84 - ext4 /dev/mapper/vg0-storage rw,data=ordered
251 103 252:2 /docker/aufs /srv/storage/docker/aufs rw,relatime - ext4 /dev/mapper/vg0-storage rw,data=ordered
201 251 0:52 / /srv/storage/docker/aufs/mnt/9a59b23dede2b7039ca0ae5046ec93a214ae96ff859d3a64e57704f018ae4acc rw,relatime - aufs none rw,si=f80a43115308217c,dio,dirperm1
202 103 0:53 / /srv/storage/docker/containers/a9d54f9d29dd80294fa386d5d0c9a3412c4e02d2715177203bc6dd12b73fe915/shm rw,nosuid,nodev,noexec,relatime shared:179 - tmpfs shm rw,size=65536k
cahirwpz commented 8 years ago

I must admit - this issue is extremely mysterious. The culprit is not really my script but something sinister below in the software stack.

Could you try replacing touch(filename) with execute('touch', filename) and see if it helps?

blubberdiblub commented 8 years ago

Unfortunately, that didn't help.

Maybe it has something to do with building inside a Docker container?

When I find the time, I will try to set up a real VM that's as close as possible to the Docker container and see if I can reproduce it.

cahirwpz commented 8 years ago

It's likely that Docker is a culprit here. It's undeniable that my script doesn't keep submodules/libdebug/configure opened for writing at the moment when execve is called. We ruled out touch function. unarc seems to be fine as it closes file after write (look here). As a last resort we could start debugging issue at system call level with strace.

cahirwpz commented 8 years ago

I cannot reproduce the error with docker on travis-ci. Find detailed information about docker version below:

docker version
Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64
Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64