docker-archive / for-aws

92 stars 26 forks source link

Cloudstor: no suitable node (scheduling constraints not satisfied on 5 nodes; missing plugin on 3 nodes #161

Closed westfood closed 6 years ago

westfood commented 6 years ago

Expected behavior

Worker runs tasks backed with Cloudstor EBS volume.

Actual behavior

Worker wont start task with error: no suitable node (scheduling constraints not satisfied on 5 nodes; missing plugin on 3 nodes.

Information

I updated user-data for manager and worker instances. So all instances has been restarted, all went fine. But two tasks depends on cloudstor EBS volume did not stared. And docker wont start them.

Cloudstor is installed on manager nodes, it seems worker nodes does not have cloudstor installed.

Maybe I miss something obvious. But cloudstor work before restarting all instances.

~ $ docker node ls
ID                            HOSTNAME                                      STATUS              AVAILABILITY        MANAGER STATUS
jj89ivc35illjhw20dgay626s     ip-172-31-9-95.us-west-2.compute.internal     Ready               Active
e2598km4xovp4u4o9b8xbo6cx     ip-172-31-14-43.us-west-2.compute.internal    Ready               Active              Reachable
yynflo704l3owzbaha0ychoni     ip-172-31-19-69.us-west-2.compute.internal    Ready               Active
j0w6wwi5vmf349xtqet6qloc2 *   ip-172-31-21-76.us-west-2.compute.internal    Ready               Active              Reachable
ifu3lf5ffp1rpe5osw2xapkwb     ip-172-31-22-170.us-west-2.compute.internal   Ready               Active              Reachable
3675b91mq7qwlh4l5w7trabii     ip-172-31-44-140.us-west-2.compute.internal   Ready               Active              Leader
1v4vsuudbx7pv8gwbch55ir9u     ip-172-31-44-215.us-west-2.compute.internal   Ready               Active
tr6s77bzij1c2cgr4ybfcgw5q     ip-172-31-46-237.us-west-2.compute.internal   Ready               Active              Reachable

~ $ docker plugin ls
ID                  NAME                DESCRIPTION                       ENABLED
852f9373e245        cloudstor:aws       cloud storage plugin for Docker   true

~ $ docker node inspect yynflo704l3owzbaha0ychoni |grep cloudstor:aws

~ $ docker node inspect self |grep cloudstor:aws
                        "Name": "cloudstor:aws"

~ $ docker volume ls
DRIVER              VOLUME NAME
local               33d6459e9366067d7baa0de01f4e20b2e44629fb7c692480555bc00e8723518d
local               5dc7087669fe53658c1ebb6cce62ac753e5f7b788c27ec3ba09374d10b52c554
local               6a289683712ba2a84b317deea4083917bbcacc35031cd152ba0279a55d8dbaaa
local               eee36058fd99c75f364ae29468230f50bdc1bcc0d2682b0d6e4013281162df2a
local               sshkey
cloudstor:aws       swarm_influx
cloudstor:aws       weblate_weblate

~ $ docker-diagnose
OK hostname=ip-172-31-21-76-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-44-140-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-22-170-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-46-237-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-14-43-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-9-95-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-19-69-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
OK hostname=ip-172-31-44-215-us-west-2-compute-internal session=1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
Done requesting diagnostics.
Your diagnostics session ID is 1530534939-iMd8L2czr3MZcPmSczD0erghQfbbZCT6
Please provide this session ID to the maintainer debugging your issue.

Steps to reproduce the behavior

  1. ...
  2. ...
westfood commented 6 years ago

I guess I have an idea, i wanted to bind Worker instances with Elastic IP. So I updated user data. But I was not sure how local IP is used in Worker nodes, so I exported it again after IP assigment.

"# guide-aws\n",
"docker run --label com.docker.editions.system --log-driver=json-file --log-opt max-size=50m --name=guide-aws --restart=always -d -e DYNAMODB_TABLE=$DYNAMODB_TABLE -e NODE_TYPE=$NODE_TYPE -e REGION=$AWS_REGION -e STACK_NAME=$STACK_NAME -e INSTANCE_NAME=$INSTANCE_NAME -e VPC_ID=$VPC_ID -e STACK_ID=\"$STACK_ID\" -e ACCOUNT_ID=$ACCOUNT_ID -e SWARM_QUEUE=\"$SWARM_QUEUE\" -e CLEANUP_QUEUE=\"$CLEANUP_QUEUE\" -e RUN_VACUUM=$RUN_VACUUM -e DOCKER_FOR_IAAS_VERSION=$DOCKER_FOR_IAAS_VERSION -e EDITION_ADDON=$EDITION_ADDON -e HAS_DDC=$HAS_DDC -e CHANNEL=$CHANNEL -v /var/run/docker.sock:/var/run/docker.sock docker4x/guide-aws:$DOCKER_FOR_IAAS_VERSION\n",
"\n",
"# AC Elastic IP association\n",
"export ELASTIC_ID=$(docker exec guide-aws aws ec2 describe-addresses --region us-west-2 --query 'Addresses[?AssociationId==null].{AllocationID:AllocationId}[0]' --output text)\n",
"export INSTANCE_ID=$(wget -qO- http://169.254.169.254/latest/meta-data/instance-id)\n",
"docker exec guide-aws aws ec2 associate-address --instance-id ${INSTANCE_ID} --allocation-id ${ELASTIC_ID} --no-allow-reassociation --region us-west-2\n",
"# AC - not sure if this address is used inside host, setting it again. Because Elastic IP could change Private IP.\n",
"export LOCAL_IP=$(wget -qO- http://169.254.169.254/latest/meta-data/local-ipv4)\n",
"\n",
"# cloudstor\n",
"docker plugin install --alias cloudstor:aws --grant-all-permissions docker4x/cloudstor:$DOCKER_FOR_IAAS_VERSION CLOUD_PLATFORM=AWS EFS_ID_REGULAR=$EFS_ID_REGULAR EFS_ID_MAXIO=$EFS_ID_MAXIO AWS_REGION=$AWS_REGION AWS_STACK_ID=$STACK_ID EFS_SUPPORTED=$ENABLE_EFS DEBUG=1\n",
"\n",

I checked the EC2 logs, association went fine, but I could not get cloudstor:

{
    "AssociationId": "eipassoc-19bff0d4"
}
Error response from daemon: Get https://registry-1.docker.io/v2/docker4x/cloudstor/manifests/17.12.0-ce-aws1: net/http: TLS handshake timeout
 [ ok ]
westfood commented 6 years ago

I played with test env, once I removed second export of LOCAL_IP, right after obtaining elastic ip, cloudstor:aws plugin installation works. I will try this in production.

LOCAL_IP is used in manager nodes for meta-aws. But I am not sure how host work with this env variable in worker nodes.

westfood commented 6 years ago

It worked in production. Will close for now.