Closed danbjoseph closed 6 years ago
Did you launch the cluster after setting up the cluster with the CLI tool?
Also, are you using the ECS AMI? Separately, I should note in the docs that the regions must all match, but I can see from your email that you're operating in the same region us-east-1
aws ecs create-cluster --cluster-name "odm"
before launching the autoscaling cluster (step 4 of the readme)Ok. Some other things to check:
You may need to change your launch configuration (if points in previous comment don't hold) and relaunch. I'm not sure about making a note in the readme as it's becoming bloated and once I fix #4 this will solve a lot of the issues.
See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/logs.html for location of ECS agent logs (in the ec2 instance) that can help with further troubleshooting.
so for 'Network' under '1. Configure Auto Scaling group details' if i put "Launch into EC2-Classic" then I get error messages about VPC security rules for a non-VPC instance when it tries to start the initial instance. if i select "vpc-xxxxxxx (xxx.xx.x.x/xx)" then i get this message in my AWS VPC dashboard, it lists only the above one. and it is not default. if i select it, "create default VPC" is disabled in the action menu. if i create one should i be able to set it as default? what settings? are there implications for our wider AWS infrastructure?
If you want to launch into EC2-classic then you could get rid of the conflicting VPC rules. Probably best to deselect the one you've created and then create a default one using the action menu, which will autoassign public IP addresses. Otherwise if you wanted to go with the separate one (to keep this separate from other services - VPC provides network isolation that can be useful security-wise) then the auto-assign feature is under VPC -> subnets.
Per https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/default-vpc.html seems you can't create a default VPC if your account is old enough that it supports ec2-classic - I guess you'd have to log a support ticket.
So, to launch into EC2-classic, you would need to modify your launch configuration (by copying the existing one, and then on the next step go back and modify), to change the security groups to EC2-Classic security groups.
Or per https://github.com/OpenDroneMap/opendronemap-ecs/issues/14#issuecomment-354875842 for each of the subnets associated with your vpc-85d2e3e2 go and change the autoassign setting.
If I change to an AWS region that shows having a default VPC for my account (e.g. us-east-2) for the cluster, will it matter that the S3 bucket with all the imagery has the region listed as us-east-1?
No, provided you don't block outbound traffic from the ec2 instances, or specify a region in the s3 policy, but unless you've changed things neither would be the case.
hmmm. repeated everything in us-east-2
➜ ~ aws ecs describe-clusters --clusters "odm"
{
"clusters": [
{
"status": "ACTIVE",
"clusterName": "odm",
"registeredContainerInstancesCount": 0,
"pendingTasksCount": 0,
"runningTasksCount": 0,
"activeServicesCount": 0,
"clusterArn": "arn:aws:ecs:us-east-2:499923577862:cluster/odm"
}
],
"failures": []
}
[ec2-user@ip-xxx-xx-x-xxx ~]$ cat /etc/ecs/ecs.config
ECS_CLUSTER=odm
on the initial instance cat /var/log/ecs/ecs-init.log
shows a repeating list of:
2018-01-02T22:56:07Z [INFO] Removing existing agent container ID: 513a789ee4aa4d60e6633cad788f3926e032a73857799d2b6ce7da0267024775
2018-01-02T22:56:07Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:09Z [INFO] Agent exited with code 1
2018-01-02T22:56:09Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:09Z [INFO] Removing existing agent container ID: 4ad1e35a71665b45b3e8b3204510ec5a0f7e4977c1838d5e8d27fa2d99020d0c
2018-01-02T22:56:09Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:09Z [INFO] Agent exited with code 1
2018-01-02T22:56:09Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:09Z [INFO] Removing existing agent container ID: b6efcda2313c6117465b5308e8dbee612d1741b4ed9f066d1aa6394a7e04043e
2018-01-02T22:56:10Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:11Z [INFO] Agent exited with code 1
2018-01-02T22:56:11Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:11Z [INFO] Removing existing agent container ID: 736c52a591de3c780791f02cd8d39872375e0fbaee44eb4a5811c57d0d1c2a3e
2018-01-02T22:56:11Z [INFO] Starting Amazon EC2 Container Service Agent
2018-01-02T22:56:12Z [INFO] Agent exited with code 1
2018-01-02T22:56:12Z [INFO] Container name: /ecs-agent
2018-01-02T22:56:12Z [INFO] Removing existing agent container ID: bd2b9eadb65f306fa42498084fc90e0da9d05fcf97e55320892f3da904971b56
2018-01-02T22:56:12Z [INFO] Starting Amazon EC2 Container Service Agent
and also:
[ec2-user@ip-172-31-9-201 ~]$ cat /var/log/ecs/ecs-agent.log.2018-01-02-22
2018-01-02T22:18:43Z [INFO] Loading configuration
2018-01-02T22:18:43Z [INFO] Loading state! module="statemanager"
2018-01-02T22:18:43Z [INFO] Event stream ContainerChange start listening...
2018-01-02T22:18:43Z [INFO] Creating root ecs cgroup: /ecs
2018-01-02T22:18:43Z [INFO] Creating cgroup /ecs
2018-01-02T22:18:43Z [INFO] Registering Instance with ECS
2018-01-02T22:18:43Z [ERROR] Could not register: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm-ecs/i-0fa57f09750803dd3 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
status code: 400, request id: e4d1bfc2-f00a-11e7-91cb-27f6e55749c8
2018-01-02T22:18:43Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm-ecs/i-0fa57f09750803dd3 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
status code: 400, request id: e4d1bfc2-f00a-11e7-91cb-27f6e55749c8
Yeah it's a permissions issue. ( https://github.com/OpenDroneMap/opendronemap-ecs/issues/14#issuecomment-354870127 point 3). Back on the launch configuration screen note the IAM role (named IAM Instance Profile):
Back on the IAM console, search for that role:
Then under permissions you should have your s3 policy ( odm-ecs ) attached but also the system policy AmazonEC2ContainerServiceforEC2Role attached as well, if either is missing click on attach, search for those policies, and attach.
(just noting I updated previous comment, which you might not pick up on GH notification emails, unless you click on link and refresh, as last line should read "search for those policies" )
despite now having those added IAM permissions, i'm still getting the same error logs
[ec2-user@ip-172-31-47-147 ~]$ cat /var/log/ecs/ecs-agent.log.2018-01-02-23
2018-01-02T23:18:15Z [INFO] Loading configuration
2018-01-02T23:18:15Z [INFO] Loading state! module="statemanager"
2018-01-02T23:18:15Z [INFO] Event stream ContainerChange start listening...
2018-01-02T23:18:15Z [INFO] Creating root ecs cgroup: /ecs
2018-01-02T23:18:15Z [INFO] Creating cgroup /ecs
2018-01-02T23:18:15Z [INFO] Registering Instance with ECS
2018-01-02T23:18:15Z [ERROR] Could not register: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm_ecsInstanceRole/i-0f1ff62b790711126 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
status code: 400, request id: 35d054ef-f013-11e7-a747-bbbdbef005a8
2018-01-02T23:18:15Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::499923577862:assumed-role/odm_ecsInstanceRole/i-0f1ff62b790711126 is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-2:499923577862:cluster/odm
status code: 400, request id: 35d054ef-f013-11e7-a747-bbbdbef005a8
I think you need to relaunch the cluster to get it to pick up the changes
From the logs it looks like it's creating a separate instance role off of the main role at launch.
deleted the auto scaling group, deleted the launch configuration, ran aws ecs delete-cluster --cluster odm
, went through setup again. no dice.
Ok I think we need to tee up a time when we can use Google Chrome Screen Sharing or something so I can step through and take a look at things.
i think i may have been looking at a stackoverflow describing a similar problem and referenced the IAM role mentioned there instead of the one you noted in the comment. thanks for your help in figuring out i had the wrong one added.
The naming convention used doesn't help—often they're too close to make sense of and it's only by reading the JSON (urgh) that I can figure out the intent. Glad you're up and running now. Closing this and will review the pull request making this policy issue clearer, and merge shortly.
Ok. Some other things to check:
- Are your instances being assigned public IP addresses? If you've created a new VPC this may not be the case (see https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html#subnet-public-ip for instructions).
- I assume the security group assigned to your instances doesn't block outbound connections (the default unless you've changed things).
- Are the ec2 instances assigned a role that has a policy attached that grants permissions? See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html for policy (on top of policy to access your s3 bucket as described).
Thanks, my issue was the Subnets
in the VPC
did not have IGW
set up in the route tables.
if i ssh into the 1 running instance of my auto scaling group, the setting from
user-data.yml
showsbut
the
odm
cluster in the Amazon ECS dashboard also notes 0 container instances