docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
751 stars 84 forks source link

potential race condition in docker stack deploy? #86

Open hholst80 opened 7 years ago

hholst80 commented 7 years ago

Expected behavior

create the service stack with the net service first.

Actual behavior

random behavior where sometimes the client and server process is created before the network is created.

deeplearning@deep6:~$ docker stack deploy --compose-file docker-compose.yml a3c
Creating service a3c_server
Error response from daemon: network a3c_net not found
deeplearning@deep6:~$ docker stack deploy --compose-file docker-compose.yml a3c
Creating service a3c_server
Error response from daemon: network a3c_net not found
deeplearning@deep6:~$ docker stack deploy --compose-file docker-compose.yml a3c
Creating service a3c_server
Error response from daemon: network a3c_net not found
deeplearning@deep6:~$ docker stack deploy --compose-file docker-compose.yml a3c
Creating network a3c_net
Creating service a3c_client
Creating service a3c_server
deeplearning@deep6:~$

Steps to reproduce the behavior

version: '3'

services:
  server:
    # image: alpine
    image: some-server.frostbite.com:5000/a3c-batch:latest
    command: python3 src/server.py --device=/gpu:0 --config=autotest/breakout --logdir=/tmp/logdir
    environment:
      - LD_LIBRARY_PATH=/usr/lib/nvidia-375
    volumes:
      - /efs:/efs
      - /usr/lib/x86_64-linux-gnu/libcuda.so.1:/usr/lib/x86_64-linux-gnu/libcuda.so.1
      - /usr/lib/x86_64-linux-gnu/libcuda.so.375.66:/usr/lib/x86_64-linux-gnu/libcuda.so.375.66
      - /usr/lib/nvidia-375:/usr/lib/nvidia-375
      - /dev/nvidia0:/dev/nvidia0
    networks:
      - net
    deploy:
      placement:
        constraints:
          - node.hostname == deep6
  client:
    # image: alpine
    image: some-server.frostbite.com:5000/a3c-batch:latest
    command: python3 src/client.py --workers=100 --forward-address=tcp://server:7000 --backprop-address=tcp://server:7001 --score-address=tcp://server:7002 --identity=client --config=autotest/breakout --logdir=/tmp/logdir
    volumes:
      - /efs:/efs
    networks:
      - net
    deploy:
      placement:
        constraints:
          - node.hostname == fb-hholst3

networks:
  net:
    driver: overlay
    driver_opts:
      secure: 'false'

Output of docker version:

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:17:04 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:15:57 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 3
 Running: 1
 Paused: 0
 Stopped: 2
Images: 3
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 36
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: jehud6ny2sbbyyevdi2wc39pm
 Is Manager: true
 ClusterID: uvt1d9i500v38xpgfjylpktx3
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.46.161.125
 Manager Addresses:
  10.46.161.125:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.8.0-59-generic
Operating System: Ubuntu 16.10
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.79GiB
Name: deep6
ID: JESZ:HDOX:CWC4:OQNS:N2EY:PSUA:XW5F:2EOK:7F7O:KURB:R5XD:WQZW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
dnephin commented 7 years ago

This issue is being tracked here: https://github.com/moby/moby/issues/29293

Usually this happens when you down then deploy, is that what happened here?

The client only creates a network if it does not exist. When a network is removed it seems like it can be in an inconsistent state where the API claims it exists, but a container can not be attached to it.

hholst80 commented 7 years ago

I did docker stack rm ... and then followed by docker stack deploy ...

hholst80 commented 7 years ago

I agree, it looks like this issue report is a duplicate of moby/moby#29293

dnephin commented 6 years ago

Closing as a duplicate.