docker / machine

Machine management for a container-centric world
https://docs.docker.com/machine/
Apache License 2.0
6.63k stars 1.97k forks source link

docker stack deploy created services but no container is created #4485

Open thiagolsfortunato opened 6 years ago

thiagolsfortunato commented 6 years ago

Expected behavior

docker stack deploy will create containers to be run on swarm node

Actual behavior

I'm using docker-machine with swarm mode, service was deployed on docker-machine manager node but no container is run / created.

Steps to reproduce the behavior

docker-machine ssh master docker stack deploy --with-registry-auth -c docker-compose.yml vault

docker-compose.yml:

version: '3.2'
services:
  vault:
    image: $REGISTRY/vault
    networks:
      - network
    ports:
      - "8200:8200"
    environment:
      - VAULT_ADDR=http://127.0.0.1:8200
      - SKIP_SETCAP=1
      - PRIVATE_KEY_NAME=$PRIV_KEY_NAME
      - PUBLIC_KEY_NAME=$PUB_KEY_NAME
    volumes:
      - ./$KEY_DIR:/vault/fotospeed-keys
      - data:/vault/logs
      - data:/vault/files/:rw
      - data:/vault/keys
    entrypoint: /vault/script/start_server.sh
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == manager]
volumes:
  data:
networks:
  network:

Output of docker-machine ssh master docker service ls:

ID                  NAME                MODE                REPLICAS            IMAGE                                    PORTS
j8ongz6wi7w7        fotospeed_vault     replicated          0/1                 10.10.10.37:8120/fotospeed-vault:0.0.1   *:8200->8200/tcp

*Output of docker-machine ssh master docker container ls:

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.10.1
  Git commit:   f150324
  Built:        Wed May  9 22:20:42 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 18.05.0-ce
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 10
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: az6c9u62b5xcy5et74nypxgjq
 Is Manager: true
 ClusterID: yjloh8pi4glsbxhixj8u54035
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.0.0.100
 Manager Addresses:
  10.0.0.100:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.93-boot2docker
Operating System: Boot2Docker 18.05.0-ce (TCL 8.2.1); HEAD : b5d6989 - Thu May 10 16:35:28 UTC 2018
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 995.6MiB
Name: LR
ID: N2UD:YZGY:RWB3:KCO4:VL5Q:XMBX:DMVX:V3OV:TM2I:P7GE:ALF4:NQYX
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: thiagolsfortunato
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox
Experimental: false
Insecure Registries:
 10.10.10.37:8120
 127.0.0.0/8
Registry Mirrors:
 http://10.10.10.37:8120/
Live Restore Enabled: false
azizzoaib786 commented 6 years ago

I have same issue.

thaJeztah commented 6 years ago

Are the $REGISTRY and $KEY_DIR environment variables set in your shell? (do they show if you type env?

Does docker service ps fotospeed_vault --no-trunc show an error message indicating why it fails to start?

thiagolsfortunato commented 6 years ago

@thaJeztah Yes, both environments was defined in my script shell.

Other service depends of the vault, but I found a new problem.. Docker Service does not support --privileg, --device, neither creating a volume /dev:/dev. I need to connect to USB Port.. because this I does not use Docker Swarm more

vishalbhaliya-94 commented 6 years ago
1. Run this command to download the latest version of Docker Compose:

sudo curl -L https://github.com/docker/compose/releases/download/1.21.2/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose

2. Apply executable permissions to the binary:

sudo chmod +x /usr/local/bin/docker-compose

3. Optionally, install command completion for the bash and zsh shell.

4. Test the installation.

$ docker-compose --version
docker-compose version 1.21.2, build 1719ceb

Follow all the steps and if you can not start docker then run below command : sudo systemctl restart docker

krearthur commented 6 years ago

I have same issue.

metoo.

I run the docker machine on windows. But this behaviour is new. Some days ago it worked... Ahh! This might be the cause: I get an i/o timeout error even when simply running

docker container run helloworld
Unable to find image 'helloworld:latest' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.15:57171->10.0.2.3:53: i/o timeout.

I'm totally lost here.. I will try to stop and start swarm mode.

cheechyuan commented 6 years ago

are you running other nodes? the container might be created on the other nodes

krearthur commented 6 years ago

just one node. Restarting the docker server resolved the problem. thanks anyway:)

ZhiqinYang commented 5 years ago

I have solved with the same operator, I wonder why case this!

marcusyoda commented 5 years ago

Follow all the steps and if you can not start docker then run below command : sudo systemctl restart docker

i have solved with restart....

locvfx commented 5 years ago

I have same issue ! Deploy new mongodb service to swam but no container created. I tried to restart docker , update docker-compose but still can't figure out why no container was created !!!!

But, here is my solution: scale up the service docker service scale mystack_mongodb=2

Check the containers: docker container ls Then scale it back to 1 docker service scale mystack_mongodb=1

Amazing, the container was forced to be created!

locvfx commented 5 years ago

I knew the reason. Guys, you need to tell Swarm where to deploy the new service.

deploy:
        mode: replicated
        replicas: 1
        placement:
          constraints: [node.role == manager]

Be sure to check the correct Node in Docker Swarm, in my case I was checking Swarm Manager but I didn't tell docker to deploy on the manager. It was deployed on other Nodes

akashagarwal7 commented 4 years ago

For me, the issue was using named volumes on CentOS (weirdly enough, the issue didn't arise on MacOS). So a volume defined in a compose file like:

volumes:
  /backend/node_modules

would prevent the containers from being created.

Jack-Ji commented 3 years ago

Same issue happened on our server today. Solved it by restart docker service.

FYI, we're using ubuntu 18.04, and docker version is:

root@LMCloud-ecs001:~# docker version
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:45:43 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:
 docker-init:
  Version:          0.18.0
  GitCommit:
misaon commented 3 years ago

Today same issue as @Jack-Ji.. Temporary solved it by restart docker service.

root@swarm-manager-1:~# docker info
Client:
 Debug Mode: false

Server:
 Containers: 9
  Running: 9
  Paused: 0
  Stopped: 0
 Images: 10
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 4xx3i2a2ozal0xyrh6ez8d2ni
  Is Manager: true
  ClusterID: xvkyqqkbx87xjo6upy9lqt10d
  Managers: 1
  Nodes: 4
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: [masked]
  Manager Addresses:
   [masked]
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-52-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.941GiB
 Name: swarm-manager-1
 ID: [masked]
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

@thaJeztah We need fix this... 😢

marcusyoda commented 3 years ago

For me, the issue was using named volumes on CentOS (weirdly enough, the issue didn't arise on MacOS). So a volume defined in a compose file like:

volumes:
  /backend/node_modules

would prevent the containers from being created.

Centos You can have some trouble writing at disk on CENTOS caused by Selinux!

Use getenforce command to check if Selinux is enabled on your machine. In case the policy is set to Enforced disable it by issuing the below commands:

getenforce // If return: `Enforcing`, change running:
sudo setenforce 0
getenforce // Needs to return `Permissive`

To completely disable Selinux on CentOS, open /etc/selinux/config file with a text editor and set the line SELINUX to 0 to disabled.

In other way... Be sure the mount point is already mounted when docker service start!

marcusyoda commented 3 years ago

Today same issue as @Jack-Ji.. Temporary solved it by restart docker service.

@thaJeztah We need fix this...

You can find some help at logs result: docker service logs SERVICE_ID

If you don't know your SERVICE_ID run: docker service ls to grab service_id at first column

The most common problems i see is caused by not attending the start service requirements, and then, tasks of service are spammed and fail many times until the throttle controle increase try interval.

Ex.: Haproxy container with rule that check service at haproxy.config, don't comes up until de service redirect comes up.

backend hello-world
        mode http
        server wp-blog hello-world:80 check port 80

In my tries, depends_on don't solved the chalenge, and i solved the problem changing architecture!

misaon commented 3 years ago

@maviniciuus @Jack-Ji Thanks so much for your interest in this issue. I see the problem in that when the deploy stack (via docker-compose.yml file), and then try $ docker stack ps <my-stack-name>, some services do not have a NODE assigned and the status is not "Running ... x hours" but "New ... x hours". When I look at a specific node (worker), the given container does not exist there. As soon as I restart the docker, it works again for a while, but after a while the problem recurs.

Its clean installation in DigitalOcean on Ubuntu 20 and Docker 19.03.13. Full docker info here: https://github.com/docker/machine/issues/4485#issuecomment-731167028

Example of stack file from production:

version: '3.8'
services:

    nginx-proxy:
        image: [masked]
        depends_on:
            - app
        networks:
            - app-network
            - traefik-public
        volumes:
            - app-public:/app/public:ro
        deploy:
            replicas: 1
            labels:
                - "traefik.enable=true"
                - "traefik.docker.network=traefik-public"
                - "traefik.constraint-label=traefik-public"
                - "traefik.http.routers.${DSD_APP_NAME}-proxy-https.rule=Host(${DSD_APP_HOSTS})"
                - "traefik.http.routers.${DSD_APP_NAME}-proxy-https.entrypoints=https"
                - "traefik.http.routers.${DSD_APP_NAME}-proxy-https.tls=true"
                - "traefik.http.routers.${DSD_APP_NAME}-proxy-https.middlewares=gzip-compress"
                - "traefik.http.services.${DSD_APP_NAME}-proxy.loadbalancer.server.port=80"
            placement:
                constraints:
                    - "node.labels.place==${DSD_WORKER_PLACE}"
            update_config:
                parallelism: 2
                delay: 10s
            restart_policy:
                condition: on-failure
                max_attempts: 5
            resources:
                limits:
                    cpus: '0.20'
                    memory: 64M
                reservations:
                    cpus: '0.10'
                    memory: 32M

    app:
        image: $DSD_APP_IMAGE
        depends_on:
            - minio
        networks:
            - app-network
            - global-app-service-network
        environment:
            - WARMUP_CACHE=true
            - ENABLE_XDEBUG=false
            - DB_NAME=$DSD_DB_NAME
            - AWS_S3_BUCKET=$DSD_AWS_S3_BUCKET
            - AWS_S3_ENDPOINT=$DSD_AWS_S3_ENDPOINT
            - AWS_S3_KEY=$DSD_AWS_S3_KEY
            - AWS_S3_SECRET=$DSD_AWS_S3_SECRET
            - ELASTIC_INDEX_SUFFIX=$DSD_ELASTIC_INDEX_SUFFIX
            - MAILER_URL=smtp://${DSD_SMTP_HOST}:1025
            - REDIS_INDEX_PREFIX=$DSD_REDIS_INDEX_PREFIX
            - SENTRY_DSN=$DSD_SENTRY_DSN
        env_file: $DSD_APP_HOSTS_ENV_FILE
        volumes:
            - app-public:/app/public
        deploy:
            replicas: 1
            placement:
                constraints:
                    - "node.labels.place==${DSD_WORKER_PLACE}"
            update_config:
                parallelism: 2
                delay: 10s
            restart_policy:
                condition: on-failure
                max_attempts: 10
            resources:
                limits:
                    cpus: '1.0'
                    memory: 512M
                reservations:
                    cpus: '0.10'
                    memory: 128M

    minio:
        image: minio/minio:RELEASE.2020-11-13T20-10-18Z
        networks:
            - app-network
            - traefik-public
        volumes:
            - s3-data:/data
        environment:
            - MINIO_ACCESS_KEY=$DSD_AWS_S3_KEY
            - MINIO_SECRET_KEY=$DSD_AWS_S3_SECRET
        command: "server /data"
        healthcheck:
            test: [ "CMD", "curl", "-f", "http://localhost:9000/minio/health/live" ]
            interval: 30s
            timeout: 20s
            retries: 3
        deploy:
            replicas: 1
            labels:
                - "traefik.enable=true"
                - "traefik.docker.network=traefik-public"
                - "traefik.constraint-label=traefik-public"
                - "traefik.http.routers.${DSD_APP_NAME}-minio-https.rule=Host(`[masked]"
                - "traefik.http.routers.${DSD_APP_NAME}-minio-https.entrypoints=https"
                - "traefik.http.routers.${DSD_APP_NAME}-minio-https.tls=true"
                - "traefik.http.routers.${DSD_APP_NAME}-minio-https.middlewares=gzip-compress"
                - "traefik.http.services.${DSD_APP_NAME}-minio.loadbalancer.server.port=9000"
            placement:
                constraints:
                    - "node.labels.place==${DSD_WORKER_PLACE}"
            update_config:
                parallelism: 2
                delay: 10s
            restart_policy:
                condition: on-failure
                max_attempts: 5
            resources:
                limits:
                    cpus: '0.10'
                    memory: 256M
                reservations:
                    cpus: '0.10'
                    memory: 64M

volumes:
    app-public:
    s3-data:

networks:
    app-network:
    traefik-public:
        external: true
    global-app-service-network:
        external: true
marcusyoda commented 3 years ago

@maviniciuus @Jack-Ji Thanks so much for your interest in this issue. I see the problem in that when the deploy stack (via docker-compose.yml file), and then try $ docker stack ps <my-stack-name>, some services do not have a NODE assigned and the status is not "Running ... x hours" but "New ... x hours". When I look at a specific node (worker), the given container does not exist there. As soon as I restart the docker, it works again for a while, but after a while the problem recurs.

minio:
    image: minio/minio:RELEASE.2020-11-13T20-10-18Z
    ....
    healthcheck:
        test: [ "CMD", "curl", "-f", "http://localhost:9000/minio/health/live" ]
        interval: 30s
        timeout: 20s
        retries: 3
   ...

nginx-proxy depends_on: app, and app depends_on: minio and your minio has your healthcheck

Try disable healthcheck, and then check ports expose and be sure that when you try check health the service is 100% up and can respond at localhost on 9000 port...

In the past i have some issue with healthcheck in cases that the container is ready, but de proccess inside the container not!

misaon commented 3 years ago

@maviniciuus @Jack-Ji Thanks so much for your interest in this issue. I see the problem in that when the deploy stack (via docker-compose.yml file), and then try $ docker stack ps <my-stack-name>, some services do not have a NODE assigned and the status is not "Running ... x hours" but "New ... x hours". When I look at a specific node (worker), the given container does not exist there. As soon as I restart the docker, it works again for a while, but after a while the problem recurs.

minio:
    image: minio/minio:RELEASE.2020-11-13T20-10-18Z
    ....
    healthcheck:
        test: [ "CMD", "curl", "-f", "http://localhost:9000/minio/health/live" ]
        interval: 30s
        timeout: 20s
        retries: 3
   ...

nginx-proxy depends_on: app, and app depends_on: minio and your minio has your healthcheck

Try disable healthcheck, and then check ports expose and be sure that when you try check health the service is 100% up and can respond at localhost on 9000 port...

In the past i have some issue with healthcheck in cases that the container is ready, but de proccess inside the container not!

@maviniciuus thanks a lot for the response and i will try your solution. However, coincidentally, we only recently included Minio in the stack. Previously, the problems manifested themselves even without a health check. Therefore, I think that will not solve this problem. However, thank you for the point of making the stack even more stable. If this issue is not resolved, the only hope is to migrate to Docker v20.x and hope that the unexpected behavior disappears.

JudahMorrison commented 3 years ago

I have the same problem on docker version 20.10.8. Any solution to this?