DaanDeMeyer / reproc

A cross-platform (C99/C++11) process library
MIT License
552 stars 65 forks source link

reproc::run fails with 'Invalid Argument' in Docker container #105

Closed ChiefGokhlayeh closed 1 year ago

ChiefGokhlayeh commented 1 year ago

I come here from debugging an issue with mamba-org/micromamba. Libmamba uses reproc++ to invoke shell scripts.

On my native Arch install this works fine. However, I'm trying to set up a devcontainer using Docker containers.

I tried compling reproc++ with examples just to be sure and lo and behold, running reproc examples in Docker:

$ ./build/reproc++/examples/run whoami        
Invalid argument

This applies to all examples in reproc++/examples.

I tested using Debian- and Fedora-based images. Here is a minimal Dockerfile to quickly showcase my problem:

FROM debian:latest

RUN apt-get update \
    && apt-get install -y \
    build-essential \
    cmake \
    git \
    && rm -rf /var/lib/apt/lists/*

Build the Docker image like so:

docker build -t test .

Invoke the Docker container, clone the repository, build the tests and execute them:

$ docker run -it --rm test

#inside the container
root@3a01ceb74d30:/# git clone https://github.com/DaanDeMeyer/reproc.git
...
root@3a01ceb74d30:/# cd reproc/
root@3a01ceb74d30:/reproc# cmake -B build -DREPROC++=ON -DREPROC_EXAMPLES=ON
...
root@3a01ceb74d30:/reproc# cmake --build build
...

#just to test the 'whoami' exists and is an executable binary
root@3a01ceb74d30:/reproc# whoami
root

#now try to run the same program in reproc
root@3a01ceb74d30:/reproc# ./build/reproc++/examples/run whoami
Invalid argument

Again, I tested the same with an Fedora-based image, same result.

ChiefGokhlayeh commented 1 year ago

Looks like the issue is limited to Arch hosts (or possibly only Kernels 6.3.7 and up). I tested on two machines (both Arch) and the above mentioned procedure resulted in "Invalid argument". I got to test it on a Ubuntu 22.04 machine (Kernel 5.15.0) the error does not occur.

ChiefGokhlayeh commented 1 year ago

I finally go around to actually debugging the issue. The error occurs in the child process (setup GDB with set follow-fork-mode child).

For whatever reason in https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L105-L121 getrlimit() returns limit.rlim_cur = 1073741815 = 0x3FFFFFF8. This of course is much larger than the set limit of MAX_FD_LIMIT = 1024 * 1024 = 1048576 = 0x100000.

The check max_fd > MAX_FD_LIMIT subsequently fails and the process is aborted. This should return EMFILE, but errno is not overwritten, so the child incorrectly reports EINVAL to the parent due to earlier suppressed errors during https://github.com/DaanDeMeyer/reproc/blob/1c07bdbec3f2ecba7125b9499b9a8a77bf9aa8c7/reproc/src/process.posix.c#L231-L237 (e.g. sigaction(SIGKILL, &action, NULL) -> errno=EINVAL which is ok and gets ignored, but errno is still set to EINVAL).

ChiefGokhlayeh commented 1 year ago

Well, I guess I found the issue. Inside my Docker container (and only in the container) the file descriptor limit is set to an insanely high number, but not unlimited.

$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       unlimited
-n: file descriptors                1073741816
-l: locked-in-memory size (kbytes)  8192
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 253585
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15: rt cpu time (microseconds)   unlimited
ChiefGokhlayeh commented 1 year ago

I was able to resolve the issue by running my Docker container with a more sane file descriptor limit:

docker run --ulimit nofile=1024:1024 ...

In VSCode devcontainer.json simply add:

{
    //...

    "runArgs": [
        "--ulimit", "nofile=1024:1024"
    ],

    // ...
}

For Docker compose file follow https://stackoverflow.com/a/58093008/4069539

ChiefGokhlayeh commented 1 year ago

Duplicate of #82