docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.86k stars 5.21k forks source link

[BUG] Intermittent failure to remove old container on recreate #11151

Open lod opened 11 months ago

lod commented 11 months ago

Description

Running the following command to recreate all containers. docker compose --project-directory /etc/bob/ up --detach --force-recreate --remove-orphans --wait

Recreating 26 different containers, the compose file definition had not changed but the source image had been rebuilt and pulled onto the machine.

An extract from the running log

     Container our-bob-1  Recreate
     ... [25 other containers] Recreate
    Error response from daemon: Error when allocating new name: Conflict. The container name "/our-bob-1" is already in use by container "136a920587be03151f4d750934c699b5482c3bd4b7c280f65df38f203dc95fa3". You have to remove (or rename) that container to be able to reuse that name.

The command ran for 0.89 seconds.

Rerunning the up command worked fine. Most of the time (multiple very similar systems) works fine. I can't produce a reproducible test case.

I suspect some sort of race condition between the deletion of the image and the creation of the new one, but have not capability to hunt down such a bug.

Steps To Reproduce

  1. docker compose up --force-recreate

Compose Version

Docker Compose version v2.17.3

Docker Environment

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.3
    Path:     /usr/lib/docker/cli-plugins/docker-compose

Server:
 Containers: 26
  Running: 26
  Paused: 0
  Stopped: 0
 Images: 62
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: syslog
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version:
 runc version:
 init version:
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.2.0-32-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 15.52GiB
 Name: bob
 ID: more-bob
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

System CPU usage at the time was about 10%, total ram usage was high but 3/4 of it was disk cache.

ndeloof commented 11 months ago

Can you try to reproduce with latest release ? v2.17 is pretty old ...

lod commented 11 months ago

Thanks, I'll pull us forward and see if it reoccurs.

Scanning the release notes I saw "Fixed a race condition when --parallel is used with a large number of dependent services" But that seems to related to https://github.com/docker/compose/pull/10544 which doesn't match the observed issue.

rvem commented 6 months ago

I'm experiencing similar issue with 2.23.1 compose and 24.0.5 docker itself. The error is transient and appears from time to time is used with docker --host ssh://<...>

However, in my case, it fails to create an intermediate container:

$ docker --host "$DOCKER_HOST" compose --file docker-compose.yml --file docker-compose.override.yml up --detach --wait
 Container <blah>-1  Recreate
Error response from daemon: Conflict. The container name "/d9a6eaa59b96_<blah>-1" is already in use by container "2ab6e555da77ff7d6b03e2d9b6ea8097a79bf647ade06d4bf12793253137c3ce". You have to remove (or rename) that container to be able to reuse that name.

The compose file itself has only one service.

I would be happy to provide more details if you have any ideas on how to debug this

UPD: looks like the issue occurs when the service image is updated, the subsequent docker compose up works just fine

UPD: UPD: In some cases the error is a bit different:

$ docker --host "$DOCKER_HOST" compose --file docker-compose.yml --file docker-compose.override.yml up --detach --wait --force-recreate
 Container <blah>-1  Recreate
Error response from daemon: No such container: 50a914c300b3cc2b6676caea4d094a2ae8c16f04c5fb0dc461b8ddc3941e0361
DerGuntha commented 5 months ago

Just noticed this myself on version 2.27 - with docker instance of nginx-proxy docker compose up would throw an error container name already in use When I changed the container_name to NOT include a - docker compose up worked again and replaced the previous container container_name: nginx-proxy --> container_name: nginxproxy

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 days ago

This issue has been automatically marked as not stale anymore due to the recent activity.