docker / cli

The Docker CLI
Apache License 2.0
4.85k stars 1.91k forks source link

After installing docker-ce 25.0.0 when building Dockerimage, the container won't run because of ulimit error #4807

Open ukrainiansteak opened 8 months ago

ukrainiansteak commented 8 months ago

Description

Inside of my Dockerfile, which uses ubuntu:20.04, we install docker-ce. It is essential for us since we need to build AWS CDK code in a custom CodeBuild container.

Here's the Dockerfile code (simplified):

FROM ubuntu:20.04

RUN apt update -y; \
    apt upgrade -y; \
    apt install software-properties-common -y; \
    apt update -y; \
    apt install wget -y; \
    apt install curl -y; \
    apt-get install ca-certificates gnupg lsb-release -y; \
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg; \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null; \
    apt-get update -y; \
    apt install docker-ce -y; \
    apt install unzip -y; \
    service docker start; \
    rm -rf /var/cache/apt;

ENTRYPOINT service docker start && /bin/bash
COPY install.py .
COPY remove.py .

The container was built successfully up until two days ago. On further investigation, I have found out that this is due to the Docker Engine upgrade to the 25.0.0 version.

If now we try docker run -t container, we get the following output (the same is printed when running service docker start while building the container:

service docker start /etc/init.d/docker: 62: ulimit: error setting limit (Invalid argument)

This is due to the line 62 in /etc/init.d/docker file which sets the ulimit hard limit: ulimit -Hn 524288

Before the most recent 25.0.0 version release, it used to be the following line: ulimit -n 1048576

When checking the /etc/security/limits.conf file inside of the Ubuntu image, I found out that the system hard limit is 100000.

If I remove the service docker start command from the Dockerfile (both in the RUN and ENTRYPOINT commands), the issue persists.

The only way I could make my image run is by hardcoding the previous version of docker-ce:

apt install docker-ce=5:24.0.7-1~ubuntu.20.04~focal -y

This has fixed the problem but is still a huge obstacle for us since we are now forced to use the older version of docker-ce and cannot get updates.

I hope this case will be helpful to anyone having the same problem as we did. I also hope a fix will be introduced so we could get the most recent updates on our image.

Reproduce

  1. Build an image using a Dockerfile and install the latest version of docker-ce.
  2. Try running the container

Expected behavior

We expected the container to work on 25.0.0 version of docker engine in the same way it did on 24.0.7

docker version

Not accessible since the container couldn't run. 
The version being installed is 25.0.0

docker info

Not accessible since the container couldn't run.

Additional Info

No response

thaJeztah commented 8 months ago

Thanks for reporting. It looks like this is an issue with code in the "engine" code, not the CLI itself, so probably the better location would be in https://github.com/moby/moby. Unfortunately, GitHub doesn't allow transfering tickets between orgs, so I cannot move it there (but perhaps you could open a new ticket?)

This issue is related to https://github.com/moby/moby/commit/c8930105bc9fc3c1a8a90886c23535cc6c41e130, which is part of this PR;

That PR changed the default ulimits to follow systemd's defaults (only raising the hard-limit, not the soft-limit).

When running the docker service as a systemd unit, this would be handled by systemd, but the sysvinit scripts are provided for non-standard setups where systemd is not used (likely in your container). I wondered if it was perhaps that ubuntu 20.04 didn't provide the H / S (hard / soft limits) option, in which case this could be a packaging issue (adjust the file depending on distro and distro-version; similar to https://github.com/docker/docker-ce-packaging/pull/968) but looks like both 20.04 and 22.04 do;

docker run --rm ubuntu:20.04 bash -c 'ulimit --help'
ulimit: ulimit [-SHabcdefiklmnpqrstuvxPT] [limit]
    Modify shell resource limits.

    Provides control over the resources available to the shell and processes
    it creates, on systems that allow such control.

    Options:
      -S    use the `soft' resource limit
      -H    use the `hard' resource limit
...
thaJeztah commented 8 months ago

Oh, wait; but you're running service docker start as part of your docker build ? Does that work? the docker service requires the container to be running with --privileged, so I'd expect it to fail during a docker build (which doesn't run as --privileged) πŸ€”

ukrainiansteak commented 8 months ago

Oh, wait; but you're running service docker start as part of your docker build ? Does that work? the docker service requires the container to be running with --privileged, so I'd expect it to fail during a docker build (which doesn't run as --privileged) πŸ€”

It works with the previous docker-ce version, yes. We don't get any errors there.

lisbon205 commented 8 months ago

I have a similar issue

i run bitwarden self-hosting at work i run rocky8 Until now it has worked for years After this update docker-ce 25.0.0 It stopped working I went back to the previous version docker-ce 24.0.7 And everything worked again

Something is wrong with the latest version.

thaJeztah commented 8 months ago

It works with the previous docker-ce version, yes. We don't get any errors there.

Maybe the initial startup worked because no containers were started, so it had just enough privileges to run. I wonder if the previous sysctl was treated as a no-op if the container was created with the same options, so the previous (ulimit -n 1048576) would be considered "no changes", therefore succeed.

What version of docker is running on the host (i.e., what version of docker is used to build the image)? Can you provide the output of docker version and docker info ?

ukrainiansteak commented 8 months ago

It works with the previous docker-ce version, yes. We don't get any errors there.

Maybe the initial startup worked because no containers were started, so it had just enough privileges to run. I wonder if the previous sysctl was treated as a no-op if the container was created with the same options, so the previous (ulimit -n 1048576) would be considered "no changes", therefore succeed.

What version of docker is running on the host (i.e., what version of docker is used to build the image)? Can you provide the output of docker version and docker info ?

docker version

Client:
 Cloud integration: v1.0.33
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:51:16 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.20.0 (109717)
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:50:59 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Version:    24.0.2
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /Users/myuser/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /Users/myuser/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/myuser/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.19
    Path:     /Users/myuser/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.4
    Path:     /Users/myuser/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/myuser/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/myuser/.docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  v0.12.0
    Path:     /Users/myuser/.docker/cli-plugins/docker-scout
WARNING: Plugin "/Users/myuser/.docker/cli-plugins/docker-feedback" is not valid: failed to fetch metadata: fork/exec /Users/myuser/.docker/cli-plugins/docker-feedback: no such file or directory

Server:
 Containers: 8
  Running: 0
  Paused: 0
  Stopped: 8
 Images: 11
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.49-linuxkit-pr
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 5.8GiB
 Name: docker-desktop
 ID: 3d744663-ef39-4886-a78f-dc0bfd451361
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false
thaJeztah commented 8 months ago

Thanks! So, I think this is the issue indeed https://github.com/docker/cli/issues/4807#issuecomment-1903759890;

I wonder if the previous sysctl was treated as a no-op if the container was created with the same options, so the previous (ulimit -n 1048576) would be considered "no changes", therefore succeed.

I tried reproducing the issue; I slightly simplified the Dockerfile, and reduced it to only the essential packages, and split the "install docker" step to a separate RUN (to allow caching other steps if I had to change something);

# syntax=docker/dockerfile:1
FROM ubuntu:20.04

RUN apt-get update -y; \
    apt-get install -y \
        ca-certificates \
        curl \
        gnupg \
        lsb-release \
        software-properties-common; \
    rm -rf /var/cache/apt;
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg; \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null; \
    apt-get update -y; \
    apt-get install -y docker-ce; \
    rm -rf /var/cache/apt;
RUN \
    echo "ulimits: $(ulimit -Sn):$(ulimit -Hn)"; \
    service docker start; \
    rm -rf /var/cache/apt;

ENTRYPOINT service docker start && /bin/bash

When building the Dockerfile on a docker 25.0 engine with BuildKit, the build works without errors;

docker build -t foo --no-cache --progress=plain .

#8 22.27 Setting up docker-ce (5:25.0.0-1~ubuntu.20.04~focal) ...
#8 22.30 invoke-rc.d: could not determine current runlevel
#8 22.31 invoke-rc.d: policy-rc.d denied execution of start.
#8 22.44 Created symlink /etc/systemd/system/multi-user.target.wants/docker.service β†’ /lib/systemd/system/docker.service.
#8 22.57 Created symlink /etc/systemd/system/sockets.target.wants/docker.socket β†’ /lib/systemd/system/docker.socket.
#8 22.57 Setting up xauth (1:1.1-0ubuntu1) ...
#8 22.58 Setting up liberror-perl (0.17029-1) ...
#8 22.58 Setting up git (1:2.25.1-1ubuntu3.11) ...
#8 22.61 Processing triggers for libc-bin (2.31-0ubuntu9.14) ...
#8 22.64 Processing triggers for systemd (245.4-4ubuntu3.22) ...
#8 22.65 Processing triggers for mime-support (3.64ubuntu1) ...
#8 DONE 22.7s

#9 [4/4] RUN     echo "ulimits: $(ulimit -Sn):$(ulimit -Hn)";     service docker start;     rm -rf /var/cache/apt;
#9 0.320 ulimits: 524288:1024
#9 0.334  * Starting Docker: docker
#9 0.336    ...done.
#9 DONE 0.3s

#10 exporting to image
#10 exporting layers
#10 exporting layers 3.0s done
#10 writing image sha256:92263a9e8ab60769e2127898c6c2fa5489aa66a383df8523862dcae96962043e
#10 writing image sha256:92263a9e8ab60769e2127898c6c2fa5489aa66a383df8523862dcae96962043e done
#10 naming to docker.io/library/foo done
#10 DONE 3.1s

When using the the legacy builder (BuildKit disabled through DOCKER_BUILDKIT=0) however, it fails

The classic builder starts containers through containerd, whereas BuildKit starts containers through BuildKit (in which case ulimit (LIMIT_NOFILE) of the dockerd service are applied, not those from containerd);

DOCKER_BUILDKIT=0 docker build -t foo --no-cache .
...
Setting up docker-ce (5:25.0.0-1~ubuntu.20.04~focal) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service β†’ /lib/systemd/system/docker.service.
Created symlink /etc/systemd/system/sockets.target.wants/docker.socket β†’ /lib/systemd/system/docker.socket.
Setting up xauth (1:1.1-0ubuntu1) ...
Setting up liberror-perl (0.17029-1) ...
Setting up git (1:2.25.1-1ubuntu3.11) ...
Processing triggers for libc-bin (2.31-0ubuntu9.14) ...
Processing triggers for systemd (245.4-4ubuntu3.22) ...
Processing triggers for mime-support (3.64ubuntu1) ...
 ---> Removed intermediate container 968fb0a992f2
 ---> c3500524a198
Step 4/5 : RUN     echo "ulimits: $(ulimit -Sn):$(ulimit -Hn)";     service docker start;     rm -rf /var/cache/apt;
 ---> Running in b8bb305e36bc
ulimits: 1073741816:1073741816
/etc/init.d/docker: 62: ulimit: error setting limit (Invalid argument)
 ---> Removed intermediate container b8bb305e36bc
 ---> 59eb60be41b5
Step 5/5 : ENTRYPOINT service docker start && /bin/bash
 ---> Running in b0455bfa8db2
 ---> Removed intermediate container b0455bfa8db2
 ---> 2bcfe0c9b509
Successfully built 2bcfe0c9b509
Successfully tagged foo:latest

When running the build on an older version of docker with the previous LIMIT_NOFILE, it fails with BuildKit as well. Here's the same build on a docker 24.0.2 (Ubuntu 18.04) test-machine that I didn't update yet;

docker build -t foo --no-cache --progress=plain .

#8 41.47 Setting up docker-ce (5:25.0.0-1~ubuntu.20.04~focal) ...
#8 41.55 invoke-rc.d: could not determine current runlevel
#8 41.56 invoke-rc.d: policy-rc.d denied execution of start.
#8 41.84 Created symlink /etc/systemd/system/multi-user.target.wants/docker.service β†’ /lib/systemd/system/docker.service.
#8 42.08 Created symlink /etc/systemd/system/sockets.target.wants/docker.socket β†’ /lib/systemd/system/docker.socket.
#8 42.09 Setting up xauth (1:1.1-0ubuntu1) ...
#8 42.10 Setting up liberror-perl (0.17029-1) ...
#8 42.12 Setting up git (1:2.25.1-1ubuntu3.11) ...
#8 42.17 Processing triggers for libc-bin (2.31-0ubuntu9.14) ...
#8 42.26 Processing triggers for systemd (245.4-4ubuntu3.22) ...
#8 42.27 Processing triggers for mime-support (3.64ubuntu1) ...
#8 DONE 42.5s

#9 [4/4] RUN     echo "ulimits: $(ulimit -Sn):$(ulimit -Hn)";     service docker start;     rm -rf /var/cache/apt;
#9 0.822 ulimits: 1048576:1048576
#9 0.857 /etc/init.d/docker: 62: ulimit: error setting limit (Invalid argument)
#9 DONE 1.0s

#10 exporting to image
#10 exporting layers
#10 exporting layers 14.3s done
#10 writing image sha256:f785f12a2297b42af040deca8ca043e025
thaJeztah commented 8 months ago

I think your build effectively happened to be "lucky" and managed to start because it didn't have to adjust the ulimits, and therefore JUST enough privileges to start the service (the ulimit of the container used during build already had the correct values set (making the ulimit -n 1048576 a no-op, so no privileges required).

Also trying what works and what doesn't; (also see setrlimit(2), and ulimit

An unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process can make arbitrary changes to either limit value.

docker run --rm --ulimit nofile=1024:524288 ubuntu:20.04 sh -c ' echo "before: $(ulimit -Hn):$(ulimit -Sn)"; ulimit -Sn 2048; ulimit -Hn 1048576; echo "after:  $(ulimit -Hn):$(ulimit -Sn)"'
sh: 1: ulimit: error setting limit (Operation not permitted)
before: 524288:1024
after:  524288:2048
docker run --rm --ulimit nofile=2048:1048576 ubuntu:20.04 sh -c ' echo "before: $(ulimit -Hn):$(ulimit -Sn)"; ulimit -Sn 1024; ulimit -Hn 524288; echo "after:  $(ulimit -Hn):$(ulimit -Sn)"'
before: 1048576:2048
after:  524288:1024

Keeping both the same also works;

docker run --rm --ulimit nofile=1024:524288 ubuntu:20.04 sh -c ' echo "before: $(ulimit -Hn):$(ulimit -Sn)"; ulimit -Sn 1024; ulimit -Hn 524288; echo "after:  $(ulimit -Hn):$(ulimit -Sn)"'
before: 524288:1024
after:  524288:1024

Note that these ulimits are configured for the docker and containerd services, so their values may differ between hosts, and even between systemd versions, which means that your build could fail depending on that.

Given that you're trying to run the docker engine in a container and ulimits can be set for the container at runtime (using the --ulimit flag, see above), which will be inherited by processes inside the container, I think a good solution for your use-case would be to just disable the ulimit in the init script;

sed -i 's/ulimit -Hn/# ulimit -Hn/g' /etc/init.d/docker;

Here's my Dockerfile with that applied;

# syntax=docker/dockerfile:1
FROM ubuntu:20.04

RUN apt-get update -y; \
    apt-get install -y \
        ca-certificates \
        curl \
        gnupg \
        lsb-release \
        software-properties-common; \
    rm -rf /var/cache/apt;
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg; \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null; \
    apt-get update -y; \
    apt-get install -y docker-ce; \
    rm -rf /var/cache/apt;
RUN \
    echo "ulimits: $(ulimit -Sn):$(ulimit -Hn)"; \
    sed -i 's/ulimit -Hn/# ulimit -Hn/g' /etc/init.d/docker; \
    service docker start; \
    rm -rf /var/cache/apt;

ENTRYPOINT service docker start && /bin/bash

With that change, the build succeeds succesfuly;

#9 [4/4] RUN     echo "ulimits: $(ulimit -Sn):$(ulimit -Hn)";     sed -i 's/ulimit -Hn/# ulimit -Hn/g' /etc/init.d/docker;     service docker start;     rm -rf /var/cache/apt;
#9 0.864 ulimits: 1048576:1048576
#9 0.906  * Starting Docker: docker
#9 0.910    ...done.
#9 DONE 1.0s
eloaf commented 6 months ago

I got the same error following the installation instructions on https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository

thaJeztah commented 6 months ago

@eloaf what is the issue you're having? Is that running the docker engine as part of a Dockerfile? For that see my comment above.

eloaf commented 6 months ago

Yes - starting the docker service yields the error

checking the system's ulimit -Hn then modifying the ulimit set in /etc/init.d/docker fixes it, similar to your fix.

Its weird I remember having the same issue years ago, then nothing, then I get it again today after installing on ubuntu 25.0.0

thaJeztah commented 6 months ago

Its weird I remember having the same issue years ago, then nothing, then I get it again today after installing on ubuntu 25.0.0

Yes, it's possible these limits changed over time, and (per my comment above) "changing" the limit only worked if no actual changes were applied (so it being a no-op).

Curious; is there a reason to start the docker service as part of the build? Is it only to verify install was successful, or does it serve any other purpose?

eloaf commented 5 months ago

Its weird I remember having the same issue years ago, then nothing, then I get it again today after installing on ubuntu 25.0.0

Yes, it's possible these limits changed over time, and (per my comment above) "changing" the limit only worked if no actual changes were applied (so it being a no-op).

Curious; is there a reason to start the docker service as part of the build? Is it only to verify install was successful, or does it serve any other purpose?

I mean, I need to start the docker service at some point to perform any builds? (I was working on a node that would get spun up and down didnt persist the installation)

thaJeztah commented 5 months ago

But this is a docker service inside a container, and the container is part of the Dockerfile / ran during docker build. Running the docker service requires a privileged (--privileged) container. Containers that are used during docker build do not support --privileged (by design).

It's still possible to run the resulting image that was built as privileged, but in that case the container must be started with the --privileged.

craigphicks commented 1 month ago

Warning: Use the --privileged flag with caution. A container with --privileged is not a securely sandboxed process. Containers in this mode can get a root shell on the host and take control over the system. For most use cases, this flag should not be the preferred solution. If your container requires escalated privileges, you should prefer to explicitly grant the necessary permissions, for example by adding individual kernel capabilities with --cap-add.

https://docs.docker.com/reference/cli/docker/container/run/#privileged