docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.75k stars 5.19k forks source link

[BUG] Race condition when populating volumes with nested volumes #11663

Open benibr opened 6 months ago

benibr commented 6 months ago

Description

We have a customer setup where a volume is bind mounted into a container and the container populates it with some files on first start. Later another volume was added which is a subdirectory inside the first volume. This subdirectory gets created by the file population of the container at first start. Now I'm aware this is not a really self consistent concept and therefore might be bad practice, however the problem is that is sometimes fails and sometime succeeds which is worse then just getting a error.

Steps To Reproduce

I tried to stay a close to the original setup as possible while removing everything which is not necessary. This is what I came up with to reproduce:

# parameters
NO_VARS=255
NO_RUNS=99
NO_INC_SIZE=3
TEST_PATH=/tmp

cd $TEST_PATH

cat << EOF > Dockerfile
FROM debian:latest

# create dirs to populate
RUN mkdir -p /foo/bar /foo/baz
RUN touch /foo/bar/iamhere

# create a larger container size
RUN for i in \$(seq 1 $NO_INC_SIZE); do \
        cp -rv /usr /foo/baz/$i/ >/dev/null 2>&1; \
    done

CMD sleep .1
EOF

cat << EOF > docker-compose.yml
services:
  test:
    container_name: test
    image: race:condition
    network_mode: "host"
    volumes:
      - type: volume
        source: parent
        target: /foo
      - type: volume
        source: subdir
        target: /foo/bar
        read_only: true

volumes:
  parent:
    driver: local
    driver_opts:
      type: none
      device: \${TEST_PATH}/parent
      o: bind

  subdir:
    driver: local
    driver_opts:
      type: none
      device: \${TEST_PATH}/parent/bar
      o: bind,ro
EOF

# build container
docker build -t race:condition .

# create a larger env file
rm -f .env
for i in $(seq 0 $NO_VARS); do
        echo "VAR$i=$i" >> .env
done
echo "TEST_PATH=$TEST_PATH" >> .env

# loop to reproduce
rm -f debug.log
for no in $(seq 0 $NO_RUNS); do
        # prepare clean state
        docker compose down >/dev/null 2>&1
        mkdir -p parent >/dev/null 2>&1
        rm -rf parent/* >/dev/null 2>&1
        docker volume rm -f tmp_parent tmp_subdir >/dev/null 2>&1
        # test & report
        docker compose up test >> debug.log 2>&1 && echo "$no: success" || echo "$no: failed"
done

Since this is a race condition it might behave differently on other systems depending on CPU,IO,etc. You can play around with the parameters in search for the "sweet spot". I did not find any config that always fails, but some that are more likely to succeed more often %)

Compose Version

Docker Compose version v2.25.0

Used on RHLE 9.3

Installed Packages
Name         : docker-compose-plugin
Version      : 2.25.0
Release      : 1.el9
Architecture : x86_64
Size         : 59 M
Source       : docker-compose-plugin-2.25.0-1.el9.src.rpm
Repository   : @System
From repo    : Default_Organization_docker-ce-stable_docker-ce-stable_el9_x86_64
Summary      : Docker Compose (V2) plugin for the Docker CLI
URL          : https://github.com/docker/compose/
License      : ASL 2.0
Description  : Docker Compose (V2) plugin for the Docker CLI.
             :
             : This plugin provides the 'docker compose' subcommand.
             :
             : The binary can also be run standalone as a direct replacement for
             : Docker Compose V1 ('docker-compose').

Docker Environment

Client: Docker Engine - Community
 Version:    26.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.25.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 26.0.0
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.14.0-284.11.1.el9_2.x86_64
 Operating System: Red Hat Enterprise Linux 9.3 (Plow)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 1.736GiB
 Name: dstorweb01tl.unicph.domain
 ID: dfe1b231-0a99-45e5-8c16-93edec96d0b2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

I'm posting this here because I'm not able to reproduce this with Docker alone. Eg. mkdir -p state; rm -rf state/* state/.*; docker run --rm -ti -v ./state/log:/var/log -v state:/var debian bash

benibr commented 6 months ago

One more thing: The reason for the $TEST_PATH variable is, that it is used like this in the production setup. However, there the path gets expanded to a full path somehow leading to an error message in case of failure looking like:

Error response from daemon: failed to populate volume: error while mounting volume '/var/lib/docker/volumes/docker-migrid_vgrid_files_readonly/_data': failed to mount local volume: mount /opt/migrid/docker-migrid/state/vgrid_files_writable:/var/lib/docker/volumes/docker-migrid_vgrid_files_readonly/_data, flags: 0x1001: no such file or directory

I'm not able to reproduce the error that way, if I set TEST_PATH=. in my test script, it always fails with an unexpanded error message like this:

Error response from daemon: failed to populate volume: error while mounting volume '/var/lib/docker/volumes/tmp_subdir/_data': failed to mount local volume: mount ./parent/bar:/var/lib/docker/volumes/tmp_subdir/_data, flags: 0x1001: no such file or directory

Not sure how that fits into the whole picture though.

ndeloof commented 6 months ago

Can you please clarify the need for second volume subdir as this one is already nested inside a volume and is mounted inside container as same subdir?

Also, as you compare this with a plain docker run... command, please note your compose file declares a volume, not just a bind mount which doesn't make it a strict equivalent. Do you really need a volume here? Can't you just define bind mounts in your compose file?

benibr commented 6 months ago

Can you please clarify the need for second volume subdir as this one is already nested inside a volume and is mounted inside container as same subdir?

As I said, I agree that this is not optimal. There isn't any necessity to have those nested volumes, but the config occurs like this due to a default setting. The production compose file of the application defines a volume with application state, inside there is a cache folder. Recently the devs wanted to have that state directory configurable so that one can set it to eg. a tmpfs directory. So they defined a sperate volume, which per default just points to inside the already existing state volume but might be overwritten by the users with another path.

Also, as you compare this with a plain docker run... command, please note your compose file declares a volume, not just a bind mount which doesn't make it a strict equivalent. Do you really need a volume here? Can't you just define bind mounts in your compose file?

Yes in the current setup the volume in necessary as it must be populated with as directory tree at first start.

I'm aware that there are ways to work aroung this but since the behavior of this particular case is inconsistent and might change with every run, I thought I report it as a bug to save others the time to dig through it again.

ndeloof commented 6 months ago

Yes in the current setup the volume in necessary as it must be populated with as directory tree at first start.

right, but in your docker run .. reproduction example you don't use such a volume but a simple bind mount. So my question: can you get the same behavior using docker volume create ... the use created volumes to run a container?

ShadowLNC commented 5 months ago

I believe #11706 is related as overlapping configs/volumes will also trigger a race condition.

benibr commented 5 months ago

Yes in the current setup the volume in necessary as it must be populated with as directory tree at first start.

right, but in your docker run .. reproduction example you don't use such a volume but a simple bind mount. So my question: can you get the same behavior using docker volume create ... the use created volumes to run a container?

I tried it, but it's not exactly the same. With plain docker I couldn't get it to succeed. It always fails but sometimes it complains about being not able to chmod the subdir and sometime it complains about not being able to mount the subdir cause it doesn't exist.

docker: Error response from daemon: failed to chmod on /var/lib/docker/volumes/rc-subdir/_data: chmod /var/lib/docker/volumes/rc-subdir/_data: read-only file system.
docker: Error response from daemon: failed to populate volume: error while mounting volume '/var/lib/docker/volumes/rc-subdir/_data': failed to mount local volume: mount /tmp/parent/subdir:/var/lib/docker/volumes/rc-subdir/_data, flags: 0x1001: no such file or directory.
Script for plain docker ```bash # parameters NO_VARS=2550 NO_RUNS=99 NO_INC_SIZE=3 TEST_PATH=/tmp cd $TEST_PATH cat << EOF > Dockerfile FROM debian:latest # create dirs to populate RUN mkdir -p /foo/bar /foo/baz /foo/subdir RUN touch /foo/bar/iamhere # create a larger container size RUN for i in \$(seq 1 $NO_INC_SIZE); do \ cp -rv /usr /foo/baz/$i/ >/dev/null 2>&1; \ done CMD sleep .1 EOF # build container docker build -t race:condition . # create a larger env file rm -f .env for i in $(seq 0 $NO_VARS); do echo "VAR$i=$i" >> .env done echo "TEST_PATH=$TEST_PATH" >> .env # loop to reproduce rm -f debug-docker.log for no in $(seq 0 $NO_RUNS); do docker rm -f docker-rc-test >/dev/null 2>&1 mkdir -p parent >/dev/null 2>&1 rm -rf parent/* >/dev/null 2>&1 docker volume rm -f rc-parent >/dev/null 2>&1 docker volume rm -f rc-subdir >/dev/null 2>&1 docker volume create --driver local --opt type=none --opt device=$TEST_PATH/parent --opt o=bind rc-parent >/dev/null 2>&1 docker volume create --driver local --opt type=none --opt device=$TEST_PATH/parent/subdir --opt o=bind,ro rc-subdir >/dev/null 2>&1 # test & report docker run -d --name docker-rc-test --env-file .env -v rc-parent:/foo -v rc-subdir:/foo/bar --rm race:condition #>> debug-docker.log 2>&1 && echo "$no: success" || echo "$no: failed" done ```
github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.