build and buildx handling cgroups differently

gregewing commented 10 months ago

Contributing guidelines

[X] I've read the contributing guidelines and wholeheartedly agree

I've found a bug and checked that ...

[X] ... the documentation does not mention anything about my problem
[X] ... there are no open or closed issues that are related to my problem

Description

I'm building a container image that relies on cgroups being managed from within the container. The container has libvirt, qemu-kvm and vagrant installed, and vagrant uses cgroups to apply resource constraints to the vagrant images nested within the docker container.

If I build the image using 'docker build' then the process seems to leave behind some changes to how docker exposes cgroups to containers. I am able to successfully run the container and start vagrant box images within the container with no problem. I can reliably and consistently work around the problem scenario by including "--cgroupns host" in the docker run command, or by running the docker build command again, which I would not expect consumers of the container to be required to do.

If I build it using 'docker buildx' then I don't see the same issue. ( is build formally deprecated in favour of buildx ? ) I get what I think is the correct behaviour and the container always fails. I can reliably and consistently work around this by including "--cgroupns host" in the docker run command.

I noticed this when I was having problems running the container image (which behaves identically regardless of build method) on a host which had not been used to build any images. Similarly if I reboot the host that was used to build the image, then run the image without building it first, then I have problems.

Further I noticed that it does not matter which container I build with 'docker build'. If I build a completely unrelated image, then the same changes to how cgroups are exposed through docker and I am able to start a vagrant box image within my vagrant container.

Docker buildx version : github.com/docker/buildx 0.11.2 0.11.2-0ubuntu1~22.04.1

Docker  version info:
Client:
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.3
 Git commit:        24.0.5-0ubuntu1~22.04.1
 Built:             Mon Aug 21 19:50:14 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.3
  Git commit:       24.0.5-0ubuntu1~22.04.1
  Built:            Mon Aug 21 19:50:14 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.2
  GitCommit:        
 runc:
  Version:          1.1.7-0ubuntu1~22.04.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

Expected behaviour

'docker build' should not leave cgroups available to images run after a build is run. perhaps a tidy-up step is missed somewhere? I expect the behaviour to be the same as building container images with 'docker buildx'

Actual behaviour

When building (any) image with 'docker build' there seems to be some alteration to how cgroups are exposed through the docker daemon to containers, be those currently running, or run at any point in the future. This is true in my environment for build activities with 'docker build' and not 'docker buildx'

Buildx version

github.com/docker/buildx 0.11.2 0.11.2-0ubuntu1~22.04.1

Docker info

Updated.  The first docker info output here is from the host prior to running any build activities immediately after a reboot:

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx

Server:
 Containers: 2
  Running: 1
  Paused: 0
  Stopped: 1
 Images: 4
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.0-14-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.45GiB
 Name: Desktop-U
 ID: 37b19cdf-9a01-4a17-bf6b-424610f10d9d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: gregewing
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

This nest docker info output was captured after running a 'docker build' build.

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx

Server:
 Containers: 3
  Running: 2
  Paused: 0
  Stopped: 1
 Images: 7
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.0-14-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.45GiB
 Name: Desktop-U
 ID: 37b19cdf-9a01-4a17-bf6b-424610f10d9d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: gregewing
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Builders list

NAME/NODE    DRIVER/ENDPOINT             STATUS  BUILDKIT             PLATFORMS
mybuilder *  docker-container                                         
  mybuilder0 unix:///var/run/docker.sock stopped                      
default      docker                                                   
  default    default                     running v0.11.6+0a15675913b7 linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386

Configuration

FROM ubuntu:22.04 as builder

ENV DEBIAN_FRONTEND=noninteractive TERM=xterm-256color MEMORY=4096 CPU=4 DISK_SIZE=50

RUN echo "do some Setting Up" && \
    apt-get update -y &&  \
        echo "Do basic install without any recommended packages, we don't need them wasting space." && \
    apt-get install -y --no-install-recommends \
    nano \
    build-essential \
    libguestfs-tools \
    openssh-server \
    linux-image-$(uname -r) \
    curl \
    net-tools \
    gettext-base \
    jq && \
        echo "Install the next packages with recommends,as they appear to be necessary for Vagrant." && \
    apt-get install -y \
    libvirt-dev \
    qemu-kvm \
    libvirt-daemon-system && \
        echo "Do some cleanup." && \
    apt-get autoremove -y && \
    apt-get clean && \
    echo "Download and install Vagrant.  Installing Vagrant from repo is not advised as the WinRM library is not shipped with Vagr>
        rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
    curl -O https://releases.hashicorp.com/vagrant/$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.c>
    dpkg -i vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&>
    rm      vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&>
    echo "Install Vagrant-libvirt plugin." && \
        vagrant plugin install vagrant-libvirt && \
    echo "prepare a folder for our vagrant box context to reside in. This is not root, as this folder is exposed to the vagrant bo>
        mkdir /vagrant

#  copy the startup.sh script so that something will happen if you run this image, and you should not get any errors.
COPY --chmod=755 startup.sh startup.sh

#  compress the multiple layers that make up this image into a single layer, making the image as small as possible.
from scratch
ENV DEBIAN_FRONTEND=noninteractive TERM=xterm-256color MEMORY=4096 CPU=4 DISK_SIZE=50
COPY --from=builder / /
ENTRYPOINT ["/startup.sh"]

Comnands to build as follows:

docker build: /usr/bin/docker image build -t gregewing/windows_on_linux:latest -f Dockerfile . /usr/bin/docker push gregewing/windows_on_linux:latest

docker buildx: /usr/bin/docker buildx build -t gregewing/windows_on_linux:latest /mnt/RAID/myDockers/vagrant/win_any/ --push

Build logs

logs from 'docker build':  

#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 2.39kB done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.1s

#3 resolve image config for docker.io/docker/dockerfile:1.5
#3 DONE 0.4s

#4 docker-image://docker.io/docker/dockerfile:1.5@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14
#4 CACHED

#5 [internal] load metadata for docker.io/library/ubuntu:22.04
#5 DONE 0.4s

#6 [builder 1/3] FROM docker.io/library/ubuntu:22.04@sha256:6042500cf4b44023ea1894effe7890666b0c5c7871ed83a97c36c76ae560bb9b
#6 DONE 0.0s

#7 [internal] load build context
#7 transferring context: 32B done
#7 DONE 0.0s

#8 [builder 3/3] COPY --chmod=755 startup.sh startup.sh
#8 CACHED

#9 [builder 2/3] RUN echo "do some Setting Up" &&     apt-get update -y &&      echo "Do basic install without any recommended packages, we don't need them wasting space." &&     apt-get install -y --no-install-recommends     nano     build-essential     libguestfs-tools     openssh-server     linux-image-$(uname -r)     curl     net-tools     gettext-base     jq &&    echo "Install the next packages with recommends,as they appear to be necessary for Vagrant." &&     apt-get install -y     libvirt-dev     qemu-kvm     libvirt-daemon-system &&    echo "Do some cleanup." &&     apt-get autoremove -y &&     apt-get clean &&     echo "Download and install Vagrant.  Installing Vagrant from repo is not advised as the WinRM library is not shipped with Vagrant installed via repos." &&     rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* &&     curl -O https://releases.hashicorp.com/vagrant/$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')/vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&     dpkg -i vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&     rm      vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&     echo "Install Vagrant-libvirt plugin." &&  vagrant plugin install vagrant-libvirt &&     echo "prepare a folder for our vagrant box context to reside in. This is not root, as this folder is exposed to the vagrant box when running." &&     mkdir /vagrant
#9 CACHED

#10 [stage-1 1/1] COPY --from=builder / /
#10 CACHED

#11 exporting to image
#11 exporting layers done
#11 writing image sha256:b4e0549be3af54bf3fffa50cc36395162870f797b5ff5ec9d1c2021b7b1356d2 0.0s done
#11 naming to docker.io/gregewing/windows_on_linux:latest 0.0s done
#11 DONE 0.0s

Logs from 'docker buildx':

#0 building with "mybuilder" instance using docker-container driver

#1 [internal] booting buildkit
#1 starting container buildx_buildkit_mybuilder0
#1 starting container buildx_buildkit_mybuilder0 0.6s done
#1 DONE 0.6s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 2.39kB done
#2 DONE 0.0s

#3 resolve image config for docker.io/docker/dockerfile:1.5
#3 ...

#4 [auth] docker/dockerfile:pull token for registry-1.docker.io
#4 DONE 0.0s

#3 resolve image config for docker.io/docker/dockerfile:1.5
#3 DONE 1.0s

#5 docker-image://docker.io/docker/dockerfile:1.5@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14
#5 resolve docker.io/docker/dockerfile:1.5@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14 0.0s done
#5 DONE 0.0s

#5 docker-image://docker.io/docker/dockerfile:1.5@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14
#5 CACHED

#6 [internal] load .dockerignore
#6 transferring context: 2B done
#6 DONE 0.0s

#7 [internal] load metadata for docker.io/library/ubuntu:22.04
#7 ...

#8 [auth] library/ubuntu:pull token for registry-1.docker.io
#8 DONE 0.0s

#7 [internal] load metadata for docker.io/library/ubuntu:22.04
#7 DONE 0.4s

#9 [internal] load build context
#9 DONE 0.0s

#10 [builder 1/3] FROM docker.io/library/ubuntu:22.04@sha256:6042500cf4b44023ea1894effe7890666b0c5c7871ed83a97c36c76ae560bb9b
#10 resolve docker.io/library/ubuntu:22.04@sha256:6042500cf4b44023ea1894effe7890666b0c5c7871ed83a97c36c76ae560bb9b 0.0s done
#10 DONE 0.0s

#9 [internal] load build context
#9 transferring context: 32B done
#9 DONE 0.0s

#11 [builder 2/3] RUN echo "do some Setting Up" &&     apt-get update -y &&     echo "Do basic install without any recommended packages, we don't need them wasting space." &&     apt-get install -y --no-install-recommends     nano     build-essential     libguestfs-tools     openssh-server     linux-image-$(uname -r)     curl     net-tools     gettext-base     jq &&    echo "Install the next packages with recommends,as they appear to be necessary for Vagrant." &&     apt-get install -y     libvirt-dev     qemu-kvm     libvirt-daemon-system &&    echo "Do some cleanup." &&     apt-get autoremove -y &&     apt-get clean &&     echo "Download and install Vagrant.  Installing Vagrant from repo is not advised as the WinRM library is not shipped with Vagrant installed via repos." &&     rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* &&     curl -O https://releases.hashicorp.com/vagrant/$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')/vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&     dpkg -i vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&     rm      vagrant_$(curl -s https://checkpoint-api.hashicorp.com/v1/check/vagrant  | jq -r -M '.current_version')-1_amd64.deb &&     echo "Install Vagrant-libvirt plugin." &&  vagrant plugin install vagrant-libvirt &&     echo "prepare a folder for our vagrant box context to reside in. This is not root, as this folder is exposed to the vagrant box when running." &&     mkdir /vagrant
#11 CACHED

#12 [builder 3/3] COPY --chmod=755 startup.sh startup.sh
#12 CACHED

#13 [stage-1 1/1] COPY --from=builder / /
#13 CACHED

#14 exporting to image
#14 exporting layers done
#14 exporting manifest sha256:904f93d436a25304429789b7165f8cf1b987643a4939875a87e7ef59ef93444d done
#14 exporting config sha256:695479ecdddfe7235b185f13d6d22f1ef4c0f2bd49cb25e1bca9745e10bfaa38 0.0s done
#14 exporting attestation manifest sha256:d624cf3c3069970759f84a507020756d7edb326b6f4f39a4259e2852450e9e8f
#14 exporting attestation manifest sha256:d624cf3c3069970759f84a507020756d7edb326b6f4f39a4259e2852450e9e8f 0.1s done
#14 exporting manifest list sha256:ab9171b31ddc7b949fbfd595f34d9cde95d3525b1fde96facf0d80576cece148 0.0s done
#14 pushing layers
#14 ...

#15 [auth] gregewing/windows_on_linux:pull,push token for registry-1.docker.io
#15 DONE 0.0s

#14 exporting to image
#14 pushing layers 1.6s done
#14 pushing manifest for docker.io/gregewing/windows_on_linux:latest@sha256:ab9171b31ddc7b949fbfd595f34d9cde95d3525b1fde96facf0d80576cece148
#14 pushing manifest for docker.io/gregewing/windows_on_linux:latest@sha256:ab9171b31ddc7b949fbfd595f34d9cde95d3525b1fde96facf0d80576cece148 1.0s done
#14 DONE 2.8s

Additional info

Like I mentioned above, it does not seem to matter which image I build using 'docker build', if I have a running vagrant container with a vagrant box downloaded and configured and I try to bring up that vagrant box, it will fail every time. if I then separately run a docker build command to build a small unrelated container from a different Dockerfile, then try bringing up the vagrant box again, it will work.

I'm prepared for this to be something to do with cgroup drivers, but I have not tried changing the cgroup driver as this seems to be a default setting.

Furthermore, as I mentioned earlier, if I add "--cgroupns host" to the docker run line, I am able to start the nested vagrant image every time, regardless of whether or not a 'docker build' activity has been performed on the host.

tonistiigi commented 10 months ago

If I build the image using 'docker build' then the process seems to leave behind some changes to how docker exposes cgroups to containers.

What changes are these? What is the difference between the created images and were the changes made by the builder or a process that you ran during the build?

gregewing commented 10 months ago

There are no apparent changes to the image, the changes appear to be in the docker instance on the host, something to do with the way access to cgroups is passed through to the running and any future containers. I dont have specifics, only the apparent difference that I describe in my initial post.

I'm beginning to wonder if perhaps its something to do with apparmor, but I have tested this and removing all apparmor policies (aa-teardown) did nothing to allow the vagrant box image to start, then running a 'docker build' immediately allows the vagrant box image to start up correctly.

From: Tõnis Tiigi @.> Sent: 10 January 2024 18:30 To: docker/buildx @.> Cc: gregewing @.>; Author @.> Subject: Re: [docker/buildx] build and buildx handling cgroups differently (Issue #2183)

If I build the image using 'docker build' then the process seems to leave behind some changes to how docker exposes cgroups to containers.

What changes are these? What is the difference between the created images and were the changes made by the builder or a process that you ran during the build?

— Reply to this email directly, view it on GitHubhttps://github.com/docker/buildx/issues/2183#issuecomment-1885400960, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHSEGJVAQRQRRHPC5L4TOW3YN3M3LAVCNFSM6AAAAABBUZVV7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBVGQYDAOJWGA. You are receiving this because you authored the thread.Message ID: @.***>

tonistiigi commented 10 months ago

There are no apparent changes to the image, the changes appear to be in the docker instance on the host,

Are you saying

you build an image on host and push it to registry
now you run the image from registry that you just built
this run will behave differently on the host that built the image compared to the host that is just running that image directly without building it.

gregewing commented 10 months ago

Yes and No.

I run the image on a couple of different hosts, but for simplicity let's focus on a single host.

I build the image, and I run it on the same host. I reboot the host (because its a workstation) and when I do, it 'resets' to what I assume is the correct configuration.

After the host is rebooted, and the configurations state is reset, if I run the vagrant image, the image itself starts up correctly but the process if starting up vagrant box images inside the docker container fails. It will continue to fail, until I run a 'docker build' in a separate terminal window ( with or without the image running ). It can be any 'docker build' activity for any Dockerfile, but not a 'docker buildx'.

I can continue to create and delete containers based on local or remotely pulled images with absolute success starting up vagrant box images within the containers as many times as I want until I reboot the host.

If I use 'docker buildx' instead of 'docker build' then things are much more stable. I am unable to start any of the vagrant box images I try inside the docker containers I create. This is a good thing, because it's consistent. As a result, I have realised that I need to include "--cgroupns host" in the docker run command in order to have the container work properly when i want to use it to start vagrant box images inside it. So I have a workaround, but I wanted to bring the odd behaviour to your attention.

It does not make a difference if I use the locally cached image, before or after it is pushed to the docker hub registry, or if I use a version pulled from the docker hub registry. Incidentally, the image is in hub.docker.com registry at gregewing/windows_on_linux:latest. as a quick test image I run vagrant init generic/alpine followed by vagrant up because the image is small, but the plan is to use this to run windows server instances for dev/test purposes.

In the spirt of full disclosure, when I run this image on a different host I had issues with cgroups again, but this presented differently and I was able to resolve it by clearing apparmor profiles on that host. I tried the same on my workstation ( the one host above) and it had no impact whatsoever. I think that is a red herring.

gregewing commented 8 months ago

Just wondering if there is more information required for this report ?

docker / buildx