instances not registering to cluster

danbjoseph commented 6 years ago

if i ssh into the 1 running instance of my auto scaling group, the setting from user-data.yml shows

[ec2-user@ip-xxx-xx-x-xxx ~]$ cat /etc/ecs/ecs.config
ECS_CLUSTER=odm

but

➜  ~ aws ecs describe-clusters --clusters "odm"
{
    "clusters": [
        {
            "status": "ACTIVE", 
            "statistics": [], 
            "clusterName": "odm", 
            "registeredContainerInstancesCount": 0, 
            "pendingTasksCount": 0, 
            "runningTasksCount": 0, 
            "activeServicesCount": 0, 
            "clusterArn": "arn:aws:ecs:us-east-1:xxxxxxxxxxxx:cluster/odm"
        }
    ], 
    "failures": []
}

the odm cluster in the Amazon ECS dashboard also notes 0 container instances

matthewberryman commented 6 years ago

Did you launch the cluster after setting up the cluster with the CLI tool?

matthewberryman commented 6 years ago

Also, are you using the ECS AMI? Separately, I should note in the docs that the regions must all match, but I can see from your email that you're operating in the same region us-east-1

danbjoseph commented 6 years ago

yes, i ran aws ecs create-cluster --cluster-name "odm" before launching the autoscaling cluster (step 4 of the readme)
this is the AMI i'm using

screen shot 2018-01-02 at 3 30 31 pm

matthewberryman commented 6 years ago

Ok. Some other things to check:

Are your instances being assigned public IP addresses? If you've created a new VPC this may not be the case (see https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html#subnet-public-ip for instructions).
I assume the security group assigned to your instances doesn't block outbound connections (the default unless you've changed things).
Are the ec2 instances assigned a role that has a policy attached that grants permissions? See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html for policy (on top of policy to access your s3 bucket as described).

matthewberryman commented 6 years ago

You may need to change your launch configuration (if points in previous comment don't hold) and relaunch. I'm not sure about making a note in the readme as it's becoming bloated and once I fix #4 this will solve a lot of the issues.

See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/logs.html for location of ECS agent logs (in the ec2 instance) that can help with further troubleshooting.

danbjoseph commented 6 years ago

so for 'Network' under '1. Configure Auto Scaling group details' if i put "Launch into EC2-Classic" then I get error messages about VPC security rules for a non-VPC instance when it tries to start the initial instance. if i select "vpc-xxxxxxx (xxx.xx.x.x/xx)" then i get this message screen shot 2018-01-02 at 3 56 33 pm in my AWS VPC dashboard, it lists only the above one. and it is not default. if i select it, "create default VPC" is disabled in the action menu. if i create one should i be able to set it as default? what settings? screen shot 2018-01-02 at 4 01 17 pm are there implications for our wider AWS infrastructure?

matthewberryman commented 6 years ago

If you want to launch into EC2-classic then you could get rid of the conflicting VPC rules. Probably best to deselect the one you've created and then create a default one using the action menu, which will autoassign public IP addresses. Otherwise if you wanted to go with the separate one (to keep this separate from other services - VPC provides network isolation that can be useful security-wise) then the auto-assign feature is under VPC -> subnets.

danbjoseph commented 6 years ago

i don't know how to fix the following that happens when i try "Launch into EC2-classic"
this happens when i try and select the available vpc option
in my VPC Dashboard, I can't "Create Default VPC"

matthewberryman commented 6 years ago

Per https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/default-vpc.html seems you can't create a default VPC if your account is old enough that it supports ec2-classic - I guess you'd have to log a support ticket.

So, to launch into EC2-classic, you would need to modify your launch configuration (by copying the existing one, and then on the next step go back and modify), to change the security groups to EC2-Classic security groups.

Or per https://github.com/OpenDroneMap/opendronemap-ecs/issues/14#issuecomment-354875842 for each of the subnets associated with your vpc-85d2e3e2 go and change the autoassign setting.

danbjoseph commented 6 years ago

If I change to an AWS region that shows having a default VPC for my account (e.g. us-east-2) for the cluster, will it matter that the S3 bucket with all the imagery has the region listed as us-east-1?

matthewberryman commented 6 years ago

No, provided you don't block outbound traffic from the ec2 instances, or specify a region in the s3 policy, but unless you've changed things neither would be the case.

danbjoseph commented 6 years ago

hmmm. repeated everything in us-east-2

the initial instance of my cluster shows a value for IPv4 Public IP. sent an email with a screen grab of the full description details in the dashboard.

still getting

➜  ~ aws ecs describe-clusters --clusters "odm"                                          
{
"clusters": [
    {
        "status": "ACTIVE", 
        "clusterName": "odm", 
        "registeredContainerInstancesCount": 0, 
        "pendingTasksCount": 0, 
        "runningTasksCount": 0, 
        "activeServicesCount": 0, 
        "clusterArn": "arn:aws:ecs:us-east-2:499923577862:cluster/odm"
    }
], 
"failures": []
}

on the initial instance, the config looks right

[ec2-user@ip-xxx-xx-x-xxx ~]$ cat /etc/ecs/ecs.config
ECS_CLUSTER=odm

IAM role assigned is as follows:
outbound security is for all traffic

danbjoseph commented 6 years ago

on the initial instance cat /var/log/ecs/ecs-init.log shows a repeating list of:

2018-01-02T22:56:07Z [INFO] Removing existing agent container ID: 513a789ee4aa4d60e6633cad788f3926e032a73857799d2b6ce7da0267024775
2018-01-02T22:56:07Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:09Z [INFO] Agent exited with code 1
2018-01-02T22:56:09Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:09Z [INFO] Removing existing agent container ID: 4ad1e35a71665b45b3e8b3204510ec5a0f7e4977c1838d5e8d27fa2d99020d0c
2018-01-02T22:56:09Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:09Z [INFO] Agent exited with code 1
2018-01-02T22:56:09Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:09Z [INFO] Removing existing agent container ID: b6efcda2313c6117465b5308e8dbee612d1741b4ed9f066d1aa6394a7e04043e
2018-01-02T22:56:10Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:11Z [INFO] Agent exited with code 1
2018-01-02T22:56:11Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:11Z [INFO] Removing existing agent container ID: 736c52a591de3c780791f02cd8d39872375e0fbaee44eb4a5811c57d0d1c2a3e
2018-01-02T22:56:11Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:12Z [INFO] Agent exited with code 1
2018-01-02T22:56:12Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:12Z [INFO] Removing existing agent container ID: bd2b9eadb65f306fa42498084fc90e0da9d05fcf97e55320892f3da904971b56
2018-01-02T22:56:12Z [INFO] Starting Amazon EC2 Container Service Agent

danbjoseph commented 6 years ago

and also:

[ec2-user@ip-172-31-9-201 ~]$ cat /var/log/ecs/ecs-agent.log.2018-01-02-22 
2018-01-02T22:18:43Z [INFO] Loading configuration
2018-01-02T22:18:43Z [INFO] Loading state! module="statemanager"
2018-01-02T22:18:43Z [INFO] Event stream ContainerChange start listening...
2018-01-02T22:18:43Z [INFO] Creating root ecs cgroup: /ecs
2018-01-02T22:18:43Z [INFO] Creating cgroup /ecs
2018-01-02T22:18:43Z [INFO] Registering Instance with ECS
2018-01-02T22:18:43Z [ERROR] Could not register: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm-ecs/i-0fa57f09750803dd3 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
    status code: 400, request id: e4d1bfc2-f00a-11e7-91cb-27f6e55749c8
2018-01-02T22:18:43Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm-ecs/i-0fa57f09750803dd3 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
    status code: 400, request id: e4d1bfc2-f00a-11e7-91cb-27f6e55749c8

matthewberryman commented 6 years ago

Yeah it's a permissions issue. ( https://github.com/OpenDroneMap/opendronemap-ecs/issues/14#issuecomment-354870127 point 3). Back on the launch configuration screen note the IAM role (named IAM Instance Profile):

Back on the IAM console, search for that role:

Then under permissions you should have your s3 policy ( odm-ecs ) attached but also the system policy AmazonEC2ContainerServiceforEC2Role attached as well, if either is missing click on attach, search for those policies, and attach.

matthewberryman commented 6 years ago

(just noting I updated previous comment, which you might not pick up on GH notification emails, unless you click on link and refresh, as last line should read "search for those policies" )

danbjoseph commented 6 years ago

despite now having those added IAM permissions, i'm still getting the same error logs

[ec2-user@ip-172-31-47-147 ~]$ cat /var/log/ecs/ecs-agent.log.2018-01-02-23 
2018-01-02T23:18:15Z [INFO] Loading configuration
2018-01-02T23:18:15Z [INFO] Loading state! module="statemanager"
2018-01-02T23:18:15Z [INFO] Event stream ContainerChange start listening...
2018-01-02T23:18:15Z [INFO] Creating root ecs cgroup: /ecs
2018-01-02T23:18:15Z [INFO] Creating cgroup /ecs
2018-01-02T23:18:15Z [INFO] Registering Instance with ECS
2018-01-02T23:18:15Z [ERROR] Could not register: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm_ecsInstanceRole/i-0f1ff62b790711126 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
    status code: 400, request id: 35d054ef-f013-11e7-a747-bbbdbef005a8
2018-01-02T23:18:15Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm_ecsInstanceRole/i-0f1ff62b790711126 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
    status code: 400, request id: 35d054ef-f013-11e7-a747-bbbdbef005a8

screen shot 2018-01-02 at 6 23 59 pm

matthewberryman commented 6 years ago

I think you need to relaunch the cluster to get it to pick up the changes

matthewberryman commented 6 years ago

From the logs it looks like it's creating a separate instance role off of the main role at launch.

danbjoseph commented 6 years ago

deleted the auto scaling group, deleted the launch configuration, ran aws ecs delete-cluster --cluster odm, went through setup again. no dice.

matthewberryman commented 6 years ago

Ok I think we need to tee up a time when we can use Google Chrome Screen Sharing or something so I can step through and take a look at things.

danbjoseph commented 6 years ago

i think i may have been looking at a stackoverflow describing a similar problem and referenced the IAM role mentioned there instead of the one you noted in the comment. thanks for your help in figuring out i had the wrong one added.

matthewberryman commented 6 years ago

The naming convention used doesn't help—often they're too close to make sense of and it's only by reading the JSON (urgh) that I can figure out the intent. Glad you're up and running now. Closing this and will review the pull request making this policy issue clearer, and merge shortly.

ShahNewazKhan commented 5 years ago

Ok. Some other things to check:

Are your instances being assigned public IP addresses? If you've created a new VPC this may not be the case (see https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html#subnet-public-ip for instructions).

I assume the security group assigned to your instances doesn't block outbound connections (the default unless you've changed things).

Are the ec2 instances assigned a role that has a policy attached that grants permissions? See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html for policy (on top of policy to access your s3 bucket as described).

Thanks, my issue was the Subnets in the VPC did not have IGW set up in the route tables.

OpenDroneMap / opendronemap-ecs

instances not registering to cluster #14