Closed oanhnn closed 4 years ago
How to make an ECS Cluster from EC2 and ASG? I hope a module like https://github.com/widdix/aws-cf-templates/blob/master/ecs/cluster.yaml Current, i am using below code
https://github.com/widdix/aws-cf-templates/blob/master/ecs/cluster.yaml
--- # Copyright 2018 widdix GmbH # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. AWSTemplateFormatVersion: '2010-09-09' Description: 'cfn-modules: AWS Auto Scaling Group singleton (Amazon Linux 2)' # cfn-modules:implements(ExposeName, ExposeSecurityGroupId) Parameters: VpcModule: Description: 'Stack name of vpc module.' Type: String AlertingModule: Description: 'Optional but recommended stack name of alerting module.' Type: String Default: '' BastionModule: Description: 'Optional but recommended stack name of module implementing Bastion.' Type: String Default: '' AlbModule: Description: 'Optional but recommended stack name of module implementing Alb.' Type: String Default: '' KeyName: Description: 'Optional key name of the Linux user ec2-user to establish a SSH connection to the EC2 instance.' Type: String Default: '' IAMUserSSHAccess: Description: 'Synchronize public keys of IAM users to enable personalized SSH access (https://github.com/widdix/aws-ec2-ssh)?' Type: String Default: false AllowedValues: [true, false] SystemsManagerAccess: Description: 'Enable AWS Systems Manager agent and Session Manager.' Type: String Default: true AllowedValues: [true, false] InstanceType: Description: 'The instance type for the EC2 instance.' Type: String Default: 't3.medium' InstanceName: Description: 'The name for the EC2 instance (auto generated if not set).' Type: String Default: '' SubnetReach: Description: 'Subnet reach.' Type: String Default: Public AllowedValues: - Public - Private LogsRetentionInDays: Description: 'Specifies the number of days you want to retain log events.' Type: Number Default: 14 AllowedValues: [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653] UserData: Description: 'Optional Bash script executed on first instance launch.' Type: String Default: '' IngressTcpPort1: Description: 'Optional port allowing ingress TCP traffic.' Type: String Default: '' IngressTcpPort2: Description: 'Optional port allowing ingress TCP traffic.' Type: String Default: '' IngressTcpPort3: Description: 'Optional port allowing ingress TCP traffic.' Type: String Default: '' ClientSgModule1: Description: 'Optional stack name of client-sg module to mark traffic from EC2 instance.' Type: String Default: '' ClientSgModule2: Description: 'Optional stack name of client-sg module to mark traffic from EC2 instance.' Type: String Default: '' ClientSgModule3: Description: 'Optional stack name of client-sg module to mark traffic from EC2 instance.' Type: String Default: '' FileSystemModule1: Description: 'Optional stack name of efs-file-system module.' Type: String Default: '' FileSystemModule2: Description: 'Optional stack name of efs-file-system module.' Type: String Default: '' FileSystemModule3: Description: 'Optional stack name of efs-file-system module.' Type: String Default: '' MaxSize: Description: 'The maximum size of the Auto Scaling group.' Type: Number Default: 4 ConstraintDescription: 'Must be >= 1' MinValue: 1 MinSize: Description: 'The minimum size of the Auto Scaling group.' Type: Number Default: 2 ConstraintDescription: 'Must be >= 1' MinValue: 1 DrainingTimeoutInSeconds: Description: 'Maximum time in seconds an EC2 instance waits when terminating until all containers are moved to another EC2 instance (draining).' Type: Number Default: 600 # 10 minutes ConstraintDescription: 'Must be in the range [60-86400]' MinValue: 60 MaxValue: 86400 # 24 hours StopContainerTimeoutInSeconds: Description: 'Time in seconds the ECS agent waits before killing a stopped container (see ECS_CONTAINER_STOP_TIMEOUT).' Type: Number Default: 300 # 5 minutes ConstraintDescription: 'Must be in the range [30-3600]' MinValue: 30 MaxValue: 3600 # 1 hour ContainerMaxCPU: Description: 'The maximum number of cpu reservation per container that you plan to run on this cluster. A container instance has 1,024 CPU units for every CPU core.' Type: Number Default: 128 ContainerMaxMemory: Description: 'The maximum number of memory reservation (in MB) per container that you plan to run on this cluster.' Type: Number Default: 128 ContainerShortageThreshold: Description: 'Scale up if free cluster capacity <= containers (based on ContainerMaxCPU and ContainerMaxMemory settings)' Type: Number Default: 2 MinValue: 0 ConstraintDescription: 'Must be >= 0' ContainerExcessThreshold: Description: 'Scale down if free cluster capacity >= containers (based on ContainerMaxCPU and ContainerMaxMemory settings)' Type: Number Default: 10 MinValue: 2 ConstraintDescription: 'Must be >= 2' ManagedPolicyArns: Description: 'Optional comma-delimited list of IAM managed policy ARNs to attach to the instance''s IAM role' Type: String Default: '' Mappings: RegionMap: 'eu-north-1': ECSAMI: 'ami-0dddc4daca44e6e99' 'ap-south-1': ECSAMI: 'ami-04322e867758d97a8' 'eu-west-3': ECSAMI: 'ami-07273195833e4f20c' 'eu-west-2': ECSAMI: 'ami-0204aa6a92a54561e' 'eu-west-1': ECSAMI: 'ami-0c5abd45f676aab4f' 'ap-northeast-2': ECSAMI: 'ami-08834c8c57e502d6d' 'ap-northeast-1': ECSAMI: 'ami-0e52aad6ac7733a6a' 'sa-east-1': ECSAMI: 'ami-00d851648873aaabc' 'ca-central-1': ECSAMI: 'ami-0498c464ec4d2ba83' 'ap-southeast-1': ECSAMI: 'ami-0047bfdb16f1f6781' 'ap-southeast-2': ECSAMI: 'ami-09475847322e5566f' 'eu-central-1': ECSAMI: 'ami-096a38c97b80cd8ec' 'us-east-1': ECSAMI: 'ami-00cf4737e238866a3' 'us-east-2': ECSAMI: 'ami-012ca23958772cf72' 'us-west-1': ECSAMI: 'ami-06d87f0156b1d4407' 'us-west-2': ECSAMI: 'ami-0a9f5be2a016dccad' Conditions: HasAlertingModule: !Not [!Equals [!Ref AlertingModule, '']] HasBastionModule: !Not [!Equals [!Ref BastionModule, '']] HasNotBastionModule: !Not [!Condition HasBastionModule] HasFileSystemModule1: !Not [!Equals [!Ref FileSystemModule1, '']] HasFileSystemModule2: !Not [!Equals [!Ref FileSystemModule2, '']] HasFileSystemModule3: !Not [!Equals [!Ref FileSystemModule3, '']] HasAlbModule: !Not [!Equals [!Ref AlbModule, '']] HasKeyName: !Not [!Equals [!Ref KeyName, '']] HasIAMUserSSHAccess: !Equals [!Ref IAMUserSSHAccess, 'true'] HasSystemsManagerAccess: !Equals [!Ref SystemsManagerAccess, 'true'] HasInstanceName: !Not [!Equals [!Ref InstanceName, '']] HasSubnetReachPublic: !Equals [!Ref SubnetReach, Public] HasIngressTcpPort1: !Not [!Equals [!Ref IngressTcpPort1, '']] HasIngressTcpPort2: !Not [!Equals [!Ref IngressTcpPort2, '']] HasIngressTcpPort3: !Not [!Equals [!Ref IngressTcpPort3, '']] HasClientSgModule1: !Not [!Equals [!Ref ClientSgModule1, '']] HasClientSgModule2: !Not [!Equals [!Ref ClientSgModule2, '']] HasClientSgModule3: !Not [!Equals [!Ref ClientSgModule3, '']] HasManagedPolicyArns: !Not [!Equals [!Ref ManagedPolicyArns, '']] Resources: Cluster: Type: 'AWS::ECS::Cluster' Properties: {} LogGroup: Type: 'AWS::Logs::LogGroup' Properties: RetentionInDays: !Ref LogsRetentionInDays SecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: !Ref 'AWS::StackName' VpcId: 'Fn::ImportValue': !Sub '${VpcModule}-Id' SecurityGroupIngressALB: Type: 'AWS::EC2::SecurityGroupIngress' Condition: HasAlbModule Properties: GroupId: !Ref SecurityGroup IpProtocol: tcp FromPort: 0 ToPort: 65535 SourceSecurityGroupId: 'Fn::ImportValue': !Sub '${AlbModule}-SecurityGroupId' SecurityGroupIngressSSHBastion: Type: 'AWS::EC2::SecurityGroupIngress' Condition: HasBastionModule Properties: GroupId: !Ref SecurityGroup IpProtocol: tcp FromPort: 22 ToPort: 22 SourceSecurityGroupId: 'Fn::ImportValue': !Sub '${BastionModule}-SecurityGroupId' SecurityGroupIngressSSHWorld: Type: 'AWS::EC2::SecurityGroupIngress' Condition: HasNotBastionModule Properties: GroupId: !Ref SecurityGroup IpProtocol: tcp FromPort: 22 ToPort: 22 CidrIp: '0.0.0.0/0' SecurityGroupIngressTcpPort1: Type: 'AWS::EC2::SecurityGroupIngress' Condition: HasIngressTcpPort1 Properties: GroupId: !Ref SecurityGroup IpProtocol: tcp FromPort: !Ref IngressTcpPort1 ToPort: !Ref IngressTcpPort1 CidrIp: '0.0.0.0/0' SecurityGroupIngressTcpPort2: Type: 'AWS::EC2::SecurityGroupIngress' Condition: HasIngressTcpPort2 Properties: GroupId: !Ref SecurityGroup IpProtocol: tcp FromPort: !Ref IngressTcpPort2 ToPort: !Ref IngressTcpPort2 CidrIp: '0.0.0.0/0' SecurityGroupIngressTcpPort3: Type: 'AWS::EC2::SecurityGroupIngress' Condition: HasIngressTcpPort3 Properties: GroupId: !Ref SecurityGroup IpProtocol: tcp FromPort: !Ref IngressTcpPort3 ToPort: !Ref IngressTcpPort3 CidrIp: '0.0.0.0/0' InstanceProfile: Type: 'AWS::IAM::InstanceProfile' Properties: Roles: - !Ref Role Role: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: 'ec2.amazonaws.com' Action: 'sts:AssumeRole' ManagedPolicyArns: !If [HasManagedPolicyArns, !Split [',', !Ref ManagedPolicyArns], !Ref 'AWS::NoValue'] Policies: - !If - HasSystemsManagerAccess - PolicyName: ssm PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - 'ssmmessages:*' # SSM Agent by https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-setting-up-messageAPIs.html - 'ssm:UpdateInstanceInformation' # SSM agent by https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-setting-up-messageAPIs.html - 'ec2messages:*' # SSM Session Manager by https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-setting-up-messageAPIs.html Resource: '*' - !Ref 'AWS::NoValue' - PolicyName: logs PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' - 'logs:DescribeLogStreams' Resource: !GetAtt 'LogGroup.Arn' - PolicyName: ecs PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - 'ecs:DiscoverPollEndpoint' Resource: '*' - Effect: Allow Action: - 'ecs:DeregisterContainerInstance' - 'ecs:RegisterContainerInstance' - 'ecs:SubmitContainerStateChange' - 'ecs:SubmitTaskStateChange' - 'ecs:ListContainerInstances' Resource: !Sub 'arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${Cluster}' - Effect: Allow Action: - 'ecs:Poll' - 'ecs:StartTelemetrySession' - 'ecs:UpdateContainerInstancesState' - 'ecs:ListTasks' - 'ecs:DescribeContainerInstances' Resource: !Sub 'arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:container-instance/*' Condition: 'StringEquals': 'ecs:cluster': !Sub 'arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${Cluster}' - PolicyName: ecr PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - 'ecr:GetAuthorizationToken' - 'ecr:BatchCheckLayerAvailability' - 'ecr:GetDownloadUrlForLayer' - 'ecr:BatchGetImage' Resource: '*' - PolicyName: autoscaling PolicyDocument: Version: '2012-10-17' Statement: - Sid: write Effect: Allow Action: 'autoscaling:CompleteLifecycleAction' Resource: '*' - PolicyName: sqs PolicyDocument: Version: '2012-10-17' Statement: - Sid: write Effect: Allow Action: - 'sqs:DeleteMessage' - 'sqs:ReceiveMessage' Resource: !GetAtt 'AutoScalingGroupLifecycleHookQueue.Arn' PolicySshAccess: Type: 'AWS::IAM::Policy' Condition: HasIAMUserSSHAccess Properties: Roles: - !Ref Role PolicyName: 'ssh-access' PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - 'iam:ListUsers' - 'iam:GetGroup' Resource: '*' - Effect: Allow Action: - 'iam:ListSSHPublicKeys' - 'iam:GetSSHPublicKey' Resource: !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:user/*' - Effect: Allow Action: 'ec2:DescribeTags' Resource: '*' PolicyAssociateAddress: Type: 'AWS::IAM::Policy' Condition: HasSubnetReachPublic Properties: Roles: - !Ref Role PolicyName: 'ec2' PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: 'ec2:AssociateAddress' Resource: '*' LaunchConfiguration: Type: 'AWS::AutoScaling::LaunchConfiguration' Metadata: 'AWS::CloudFormation::Init': configSets: default: !If [HasIAMUserSSHAccess, [awslogs, ssh-access, install], [awslogs, install]] awslogs: packages: yum: awslogs: [] files: '/etc/awslogs/awscli.conf': content: !Sub | [default] region = ${AWS::Region} [plugins] cwlogs = cwlogs mode: '000644' owner: root group: root '/etc/awslogs/awslogs.conf': content: !Sub | [general] state_file = /var/lib/awslogs/agent-state [/var/log/amazon/ssm/amazon-ssm-agent.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/amazon/ssm/amazon-ssm-agent.log log_stream_name = {instance_id}/var/log/amazon/ssm/amazon-ssm-agent.log log_group_name = ${LogGroup} [/var/log/amazon/ssm/errors.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/amazon/ssm/errors.log log_stream_name = {instance_id}/var/log/amazon/ssm/errors.log log_group_name = ${LogGroup} [/var/log/audit/audit.log] file = /var/log/audit/audit.log log_stream_name = {instance_id}/var/log/audit/audit.log log_group_name = ${LogGroup} [/var/log/awslogs.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/awslogs.log log_stream_name = {instance_id}/var/log/awslogs.log log_group_name = ${LogGroup} [/var/log/boot.log] file = /var/log/boot.log log_stream_name = {instance_id}/var/log/boot.log log_group_name = ${LogGroup} [/var/log/cfn-hup.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/cfn-hup.log log_stream_name = {instance_id}/var/log/cfn-hup.log log_group_name = ${LogGroup} [/var/log/cfn-init-cmd.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/cfn-init-cmd.log log_stream_name = {instance_id}/var/log/cfn-init-cmd.log log_group_name = ${LogGroup} [/var/log/cfn-init.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/cfn-init.log log_stream_name = {instance_id}/var/log/cfn-init.log log_group_name = ${LogGroup} [/var/log/cfn-wire.log] datetime_format = %Y-%m-%d %H:%M:%S file = /var/log/cfn-wire.log log_stream_name = {instance_id}/var/log/cfn-wire.log log_group_name = ${LogGroup} [/var/log/cloud-init-output.log] file = /var/log/cloud-init-output.log log_stream_name = {instance_id}/var/log/cloud-init-output.log log_group_name = ${LogGroup} [/var/log/cloud-init.log] datetime_format = %b %d %H:%M:%S file = /var/log/cloud-init.log log_stream_name = {instance_id}/var/log/cloud-init.log log_group_name = ${LogGroup} [/var/log/cron] datetime_format = %b %d %H:%M:%S file = /var/log/cron log_stream_name = {instance_id}/var/log/cron log_group_name = ${LogGroup} [/var/log/dmesg] file = /var/log/dmesg log_stream_name = {instance_id}/var/log/dmesg log_group_name = ${LogGroup} [/var/log/grubby_prune_debug] file = /var/log/grubby_prune_debug log_stream_name = {instance_id}/var/log/grubby_prune_debug log_group_name = ${LogGroup} [/var/log/maillog] datetime_format = %b %d %H:%M:%S file = /var/log/maillog log_stream_name = {instance_id}/var/log/maillog log_group_name = ${LogGroup} [/var/log/messages] datetime_format = %b %d %H:%M:%S file = /var/log/messages log_stream_name = {instance_id}/var/log/messages log_group_name = ${LogGroup} [/var/log/secure] datetime_format = %b %d %H:%M:%S file = /var/log/secure log_stream_name = {instance_id}/var/log/secure log_group_name = ${LogGroup} [/var/log/yum.log] datetime_format = %b %d %H:%M:%S file = /var/log/yum.log log_stream_name = {instance_id}/var/log/yum.log log_group_name = ${LogGroup} mode: '000644' owner: root group: root '/etc/awslogs/config/ecs.conf': content: !Sub | [/var/log/ecs/ecs-init.log] file = /var/log/ecs/ecs-init.log log_group_name = /var/log/ecs/ecs-init.log log_stream_name = {instance_id}/var/log/ecs/ecs-init.log datetime_format = %Y-%m-%dT%H:%M:%SZ [/var/log/ecs/ecs-agent.log] file = /var/log/ecs/ecs-agent.log.* log_stream_name = {instance_id}/var/log/ecs/ecs-agent.log log_group_name = ${LogGroup} datetime_format = %Y-%m-%dT%H:%M:%SZ mode: '000644' owner: root group: root services: sysvinit: awslogsd: enabled: true ensureRunning: true packages: yum: - awslogs files: - '/etc/awslogs/awslogs.conf' - '/etc/awslogs/awscli.conf' - '/etc/awslogs/config/ecs.conf' ssh-access: packages: rpm: aws-ec2-ssh: 'https://s3-eu-west-1.amazonaws.com/widdix-aws-ec2-ssh-releases-eu-west-1/aws-ec2-ssh-1.9.2-1.el7.centos.noarch.rpm' commands: a_configure_sudo: command: 'sed -i ''s/SUDOERS_GROUPS=""/SUDOERS_GROUPS="##ALL##"/g'' /etc/aws-ec2-ssh.conf' test: 'grep -q ''SUDOERS_GROUPS=""'' /etc/aws-ec2-ssh.conf' b_enable: command: 'sed -i ''s/DONOTSYNC=1/DONOTSYNC=0/g'' /etc/aws-ec2-ssh.conf && /usr/bin/import_users.sh' test: 'grep -q ''DONOTSYNC=1'' /etc/aws-ec2-ssh.conf' install: packages: yum: amazon-ssm-agent: [] files: '/etc/cfn/cfn-hup.conf': content: !Sub | [main] stack=${AWS::StackId} region=${AWS::Region} interval=1 mode: '000400' owner: root group: root '/etc/cfn/hooks.d/cfn-auto-reloader.conf': content: !Sub | [cfn-auto-reloader-hook] triggers=post.update path=Resources.LaunchConfiguration.Metadata.AWS::CloudFormation::Init action=/opt/aws/bin/cfn-init --verbose --stack=${AWS::StackName} --region=${AWS::Region} --resource=LaunchConfiguration runas=root services: sysvinit: cfn-hup: enabled: true ensureRunning: true files: - '/etc/cfn/cfn-hup.conf' - '/etc/cfn/hooks.d/cfn-auto-reloader.conf' amazon-ssm-agent: enabled: !If [HasSystemsManagerAccess, true, false] ensureRunning: !If [HasSystemsManagerAccess, true, false] packages: yum: - amazon-ssm-agent Properties: AssociatePublicIpAddress: !If [HasSubnetReachPublic, true, false] IamInstanceProfile: !Ref InstanceProfile ImageId: !FindInMap [RegionMap, !Ref 'AWS::Region', ECSAMI] InstanceMonitoring: false InstanceType: !Ref InstanceType KeyName: !If [HasKeyName, !Ref KeyName, !Ref 'AWS::NoValue'] SecurityGroups: - !Ref SecurityGroup - !If [HasClientSgModule1, {'Fn::ImportValue': !Sub '${ClientSgModule1}-SecurityGroupId'}, !Ref 'AWS::NoValue'] - !If [HasClientSgModule2, {'Fn::ImportValue': !Sub '${ClientSgModule2}-SecurityGroupId'}, !Ref 'AWS::NoValue'] - !If [HasClientSgModule3, {'Fn::ImportValue': !Sub '${ClientSgModule3}-SecurityGroupId'}, !Ref 'AWS::NoValue'] UserData: 'Fn::Base64': !Sub - | #!/bin/bash -ex trap '/opt/aws/bin/cfn-signal -e 1 --region ${Region} --stack ${StackName} --resource AutoScalingGroup' ERR echo "ECS_CLUSTER=${Cluster}" >> /etc/ecs/ecs.config echo "ECS_CONTAINER_STOP_TIMEOUT=${StopContainerTimeoutInSeconds}s" >> /etc/ecs/ecs.config yum install -y aws-cfn-bootstrap ${UserDataMountFileSystem1} ${UserDataMountFileSystem2} ${UserDataMountFileSystem3} mount -a /opt/aws/bin/cfn-init -v --region ${Region} --stack ${StackName} --resource LaunchConfiguration ${UserData} /opt/aws/bin/cfn-signal -e 0 --region ${Region} --stack ${StackName} --resource AutoScalingGroup - Region: !Ref 'AWS::Region' StackName: !Ref 'AWS::StackName' UserDataMountFileSystem1: !If [HasFileSystemModule1, !Join ['', ['yum install -y amazon-efs-utils && mkdir -p /mnt/efs1 && echo "', {'Fn::ImportValue': !Sub '${FileSystemModule1}-Id'}, ':/ /mnt/efs1 efs defaults,_netdev 0 0" >> /etc/fstab']], ''] UserDataMountFileSystem2: !If [HasFileSystemModule2, !Join ['', ['yum install -y amazon-efs-utils && mkdir -p /mnt/efs2 && echo "', {'Fn::ImportValue': !Sub '${FileSystemModule2}-Id'}, ':/ /mnt/efs2 efs defaults,_netdev 0 0" >> /etc/fstab']], ''] UserDataMountFileSystem3: !If [HasFileSystemModule3, !Join ['', ['yum install -y amazon-efs-utils && mkdir -p /mnt/efs3 && echo "', {'Fn::ImportValue': !Sub '${FileSystemModule3}-Id'}, ':/ /mnt/efs3 efs defaults,_netdev 0 0" >> /etc/fstab']], ''] UserData: !Ref UserData AutoScalingGroup: Type: 'AWS::AutoScaling::AutoScalingGroup' Properties: LaunchConfigurationName: !Ref LaunchConfiguration MaxSize: !Ref MaxSize MinSize: !Ref MinSize Cooldown: '120' HealthCheckGracePeriod: 300 # HealthCheckType: ELB # TargetGroupARNs: # - !Ref DefaultTargetGroup NotificationConfigurations: !If - HasAlertingModule - - NotificationTypes: - 'autoscaling:EC2_INSTANCE_LAUNCH_ERROR' - 'autoscaling:EC2_INSTANCE_TERMINATE_ERROR' TopicARN: 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' - [] Tags: - Key: Name Value: !If [HasInstanceName, !Ref InstanceName, !Sub '${AWS::StackName}-instance'] PropagateAtLaunch: true VPCZoneIdentifier: !Split - ',' - 'Fn::ImportValue': !Sub '${VpcModule}-SubnetIds${SubnetReach}' CreationPolicy: ResourceSignal: Count: 1 Timeout: PT15M UpdatePolicy: AutoScalingRollingUpdate: PauseTime: PT15M SuspendProcesses: - HealthCheck - ReplaceUnhealthy - AZRebalance - AlarmNotification - ScheduledActions WaitOnResourceSignals: true CPUTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Average CPU utilization over last 10 minutes higher than 80%' Namespace: 'AWS/EC2' MetricName: CPUUtilization Statistic: Average Period: 600 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 80 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: AutoScalingGroupName Value: !Ref AutoScalingGroup AutoScalingGroupLifecycleHookQueue: Type: 'AWS::SQS::Queue' Properties: QueueName: !Sub '${AWS::StackName}-lifecycle-hook' VisibilityTimeout: 60 RedrivePolicy: deadLetterTargetArn: !GetAtt 'AutoScalingGroupLifecycleHookDeadLetterQueue.Arn' maxReceiveCount: 5 AutoScalingGroupLifecycleHookQueueTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Queue contains messages older than 10 minutes, messages are not consumed' Namespace: 'AWS/SQS' MetricName: ApproximateAgeOfOldestMessage Statistic: Maximum Period: 60 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 600 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: QueueName Value: !GetAtt 'AutoScalingGroupLifecycleHookQueue.QueueName' AutoScalingGroupLifecycleHookDeadLetterQueue: Type: 'AWS::SQS::Queue' Properties: QueueName: !Sub '${AWS::StackName}-lifecycle-hook-dlq' AutoScalingGroupLifecycleHookDeadLetterQueueTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Dead letter queue contains messages, message processing failed' Namespace: 'AWS/SQS' MetricName: ApproximateNumberOfMessagesVisible Statistic: Sum Period: 60 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 0 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: QueueName Value: !GetAtt 'AutoScalingGroupLifecycleHookDeadLetterQueue.QueueName' AutoScalingGroupLifecycleHookIAMRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: 'autoscaling.amazonaws.com' Action: 'sts:AssumeRole' Policies: - PolicyName: sqs PolicyDocument: Version: '2012-10-17' Statement: - Sid: write Effect: Allow Action: - 'sqs:SendMessage' - 'sqs:GetQueueUrl' Resource: !GetAtt 'AutoScalingGroupLifecycleHookQueue.Arn' AutoScalingGroupTerminatingLifecycleHook: Type: 'AWS::AutoScaling::LifecycleHook' Properties: HeartbeatTimeout: 600 DefaultResult: CONTINUE AutoScalingGroupName: !Ref AutoScalingGroup LifecycleTransition: 'autoscaling:EC2_INSTANCE_TERMINATING' NotificationTargetARN: !GetAtt 'AutoScalingGroupLifecycleHookQueue.Arn' RoleARN: !GetAtt 'AutoScalingGroupLifecycleHookIAMRole.Arn' ScaleUpPolicy: Type: 'AWS::AutoScaling::ScalingPolicy' Properties: AutoScalingGroupName: !Ref AutoScalingGroup PolicyType: StepScaling AdjustmentType: PercentChangeInCapacity MinAdjustmentMagnitude: 1 StepAdjustments: - MetricIntervalUpperBound: 0.0 ScalingAdjustment: 25 ScaleDownPolicy: Type: 'AWS::AutoScaling::ScalingPolicy' Properties: AutoScalingGroupName: !Ref AutoScalingGroup PolicyType: StepScaling AdjustmentType: PercentChangeInCapacity MinAdjustmentMagnitude: 1 StepAdjustments: - MetricIntervalLowerBound: 0.0 ScalingAdjustment: -25 ContainerInstancesShortageAlarm: Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Cluster is running out of container instances' Namespace: !Ref 'AWS::StackName' Dimensions: - Name: ClusterName Value: !Ref Cluster MetricName: SchedulableContainers ComparisonOperator: LessThanOrEqualToThreshold Statistic: Minimum # special rule because we scale on reservations and not utilization Period: 60 EvaluationPeriods: 1 Threshold: !Ref ContainerShortageThreshold AlarmActions: - !Ref ScaleUpPolicy ContainerInstancesExcessAlarm: Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Cluster is wasting container instances' Namespace: !Ref 'AWS::StackName' Dimensions: - Name: ClusterName Value: !Ref Cluster MetricName: SchedulableContainers ComparisonOperator: GreaterThanOrEqualToThreshold Statistic: Maximum # special rule because we scale on reservations and not utilization Period: 60 EvaluationPeriods: 15 DatapointsToAlarm: 15 Threshold: !Ref ContainerExcessThreshold AlarmActions: - !Ref ScaleDownPolicy CPUReservationTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Average CPU reservation over last 10 minutes higher than 90%' Namespace: 'AWS/ECS' MetricName: CPUReservation Statistic: Average # special rule because we scale on reservations and not utilization Period: 600 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 90 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: ClusterName Value: !Ref Cluster CPUUtilizationTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Average CPU utilization over last 10 minutes higher than 80%' Namespace: 'AWS/ECS' MetricName: CPUUtilization Statistic: Average Period: 600 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 80 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: ClusterName Value: !Ref Cluster MemoryReservationTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Average memory reservation over last 10 minutes higher than 90%' Namespace: 'AWS/ECS' MetricName: MemoryReservation Statistic: Average # special rule because we scale on reservations and not utilization Period: 600 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 90 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: ClusterName Value: !Ref Cluster MemoryUtilizationTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Average memory utilization over last 10 minutes higher than 80%' Namespace: 'AWS/ECS' MetricName: MemoryUtilization Statistic: Average Period: 600 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 80 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: ClusterName Value: !Ref Cluster # scaling based on SchedulableContainers is described in detail here: http://garbe.io/blog/2017/04/12/a-better-solution-to-ecs-autoscaling/ SchedulableContainersCron: DependsOn: - SchedulableContainersLambdaPolicy Type: 'AWS::Events::Rule' Properties: ScheduleExpression: 'rate(1 minute)' State: ENABLED Targets: - Arn: !GetAtt 'SchedulableContainersLambdaV2.Arn' Id: lambda SchedulableContainersCronFailedInvocationsTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Invocations failed permanently' Namespace: 'AWS/Events' MetricName: FailedInvocations Statistic: Sum Period: 60 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 0 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: RuleName Value: !Ref SchedulableContainersCron SchedulableContainersLambdaRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: 'lambda.amazonaws.com' Action: 'sts:AssumeRole' Policies: - PolicyName: ecs PolicyDocument: Statement: - Effect: Allow Action: 'ecs:ListContainerInstances' Resource: !Sub 'arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${Cluster}' - Effect: Allow Action: 'ecs:DescribeContainerInstances' Resource: '*' Condition: ArnEquals: 'ecs:cluster': !Sub 'arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${Cluster}' - PolicyName: cloudwatch PolicyDocument: Statement: - Effect: Allow Action: 'cloudwatch:PutMetricData' Resource: '*' SchedulableContainersLambdaPolicy: Type: 'AWS::IAM::Policy' Properties: Roles: - !Ref SchedulableContainersLambdaRole PolicyName: lambda PolicyDocument: Statement: - Effect: Allow Action: - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: !GetAtt 'SchedulableContainersLogGroup.Arn' SchedulableContainersLambdaPermission2: Type: 'AWS::Lambda::Permission' Properties: Action: 'lambda:InvokeFunction' FunctionName: !Ref SchedulableContainersLambdaV2 Principal: 'events.amazonaws.com' SourceArn: !GetAtt 'SchedulableContainersCron.Arn' SchedulableContainersLambdaV2: Type: 'AWS::Lambda::Function' Properties: Code: ZipFile: !Sub | 'use strict'; const AWS = require('aws-sdk'); const ecs = new AWS.ECS({apiVersion: '2014-11-13'}); const cloudwatch = new AWS.CloudWatch({apiVersion: '2010-08-01'}); const CONTAINER_MAX_CPU = ${ContainerMaxCPU}; const CONTAINER_MAX_MEMORY = ${ContainerMaxMemory}; const CLUSTER = '${Cluster}'; const NAMESPACE = '${AWS::StackName}'; function list(nextToken) { return ecs.listContainerInstances({ cluster: CLUSTER, maxResults: 1, nextToken: nextToken, status: 'ACTIVE' }).promise(); } function describe(containerInstanceArns) { return ecs.describeContainerInstances({ cluster: CLUSTER, containerInstances: containerInstanceArns }).promise(); } function compute(totalSchedulableContainers, nextToken) { return list(nextToken) .then((list) => { return describe(list.containerInstanceArns) .then((data) => { const localSchedulableContainers = data.containerInstances .map((instance) => ({ cpu: instance.remainingResources.find((resource) => resource.name === 'CPU').integerValue, memory: instance.remainingResources.find((resource) => resource.name === 'MEMORY').integerValue })) .map((remaining) => Math.min(Math.floor(remaining.cpu/CONTAINER_MAX_CPU), Math.floor(remaining.memory/CONTAINER_MAX_MEMORY))) .reduce((acc, containers) => acc + containers, 0); console.log(`localSchedulableContainers ${!localSchedulableContainers}`); if (list.nextToken !== null && list.nextToken !== undefined) { return compute(localSchedulableContainers + totalSchedulableContainers, list.nextToken); } else { return localSchedulableContainers + totalSchedulableContainers; } }); }); } exports.handler = (event, context, cb) => { console.log(`Invoke: ${!JSON.stringify(event)}`); compute(0, undefined) .then((schedulableContainers) => { console.log(`schedulableContainers: ${!schedulableContainers}`); return cloudwatch.putMetricData({ MetricData: [{ MetricName: 'SchedulableContainers', Dimensions: [{ Name: 'ClusterName', Value: CLUSTER }], Value: schedulableContainers, Unit: 'Count' }], Namespace: NAMESPACE }).promise(); }) .then(() => cb()) .catch(cb); }; Handler: 'index.handler' MemorySize: 128 Role: !GetAtt 'SchedulableContainersLambdaRole.Arn' Runtime: 'nodejs8.10' Timeout: 60 SchedulableContainersLogGroup: Type: 'AWS::Logs::LogGroup' Properties: LogGroupName: !Sub '/aws/lambda/${SchedulableContainersLambdaV2}' RetentionInDays: !Ref LogsRetentionInDays SchedulableContainersLambdaErrorsTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Invocations failed due to errors in the function' Namespace: 'AWS/Lambda' MetricName: Errors Statistic: Sum Period: 60 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 0 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: FunctionName Value: !Ref SchedulableContainersLambdaV2 SchedulableContainersLambdaThrottlesTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Invocation attempts that were throttled due to invocation rates exceeding the concurrent limits' Namespace: 'AWS/Lambda' MetricName: Throttles Statistic: Sum Period: 60 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 0 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: FunctionName Value: !Ref SchedulableContainersLambdaV2 DrainInstanceLambdaRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: 'lambda.amazonaws.com' Action: 'sts:AssumeRole' Policies: - PolicyName: draininstance PolicyDocument: Statement: - Effect: Allow Action: - 'sqs:DeleteMessage' - 'sqs:ReceiveMessage' - 'sqs:SendMessage' - 'sqs:GetQueueAttributes' Resource: !GetAtt 'AutoScalingGroupLifecycleHookQueue.Arn' - Effect: Allow Action: - 'ecs:ListContainerInstances' Resource: !GetAtt 'Cluster.Arn' - Effect: Allow Action: - 'ecs:updateContainerInstancesState' - 'ecs:listTasks' Resource: '*' Condition: StringEquals: 'ecs:cluster': !GetAtt 'Cluster.Arn' - Effect: Allow Action: - 'autoscaling:CompleteLifecycleAction' - 'autoscaling:RecordLifecycleActionHeartbeat' Resource: !Sub 'arn:${AWS::Partition}:autoscaling:${AWS::Region}:${AWS::AccountId}:autoScalingGroup:*:autoScalingGroupName/${AutoScalingGroup}' DrainInstanceLambdaPolicy: Type: 'AWS::IAM::Policy' Properties: Roles: - !Ref DrainInstanceLambdaRole PolicyName: lambda PolicyDocument: Statement: - Effect: Allow Action: - 'logs:CreateLogStream' - 'logs:PutLogEvents' Resource: !GetAtt 'DrainInstanceLogGroup.Arn' DrainInstanceEventSourceMapping: DependsOn: - DrainInstanceLambdaPolicy - DrainInstanceLogGroup Type: 'AWS::Lambda::EventSourceMapping' Properties: BatchSize: 1 Enabled: true EventSourceArn: !GetAtt 'AutoScalingGroupLifecycleHookQueue.Arn' FunctionName: !GetAtt DrainInstanceLambda.Arn DrainInstanceLambda: Type: 'AWS::Lambda::Function' Properties: Code: ZipFile: | 'use strict'; const AWS = require('aws-sdk'); const ecs = new AWS.ECS({apiVersion: '2014-11-13'}); const sqs = new AWS.SQS({apiVersion: '2012-11-05'}); const asg = new AWS.AutoScaling({apiVersion: '2011-01-01'}); const cluster = process.env.CLUSTER; const queueUrl = process.env.QUEUE_URL; const drainingTimeout = process.env.DRAINING_TIMEOUT; async function getContainerInstanceArn(ec2InstanceId) { console.log(`getContainerInstanceArn(${[...arguments].join(', ')})`); const listResult = await ecs.listContainerInstances({cluster: cluster, filter: `ec2InstanceId == '${ec2InstanceId}'`}).promise(); return listResult.containerInstanceArns[0]; } async function drainInstance(ciArn) { console.log(`drainInstance(${[...arguments].join(', ')})`); await ecs.updateContainerInstancesState({cluster: cluster, containerInstances: [ciArn], status: 'DRAINING'}).promise(); } async function wait(ciArn, asgName, lchName, lcaToken, terminateTime) { console.log(`wait(${[...arguments].join(', ')})`); const payload = { Service: 'DrainInstance', Event: 'custom:DRAIN_WAIT', ContainerInstanceArn: ciArn, AutoScalingGroupName: asgName, LifecycleHookName: lchName, LifecycleActionToken: lcaToken, TerminateTime: terminateTime }; await sqs.sendMessage({ QueueUrl: queueUrl, DelaySeconds: 60, MessageBody: JSON.stringify(payload) }).promise(); } async function countTasks(ciArn) { console.log(`countTasks(${[...arguments].join(', ')})`); const listResult = await ecs.listTasks({cluster: cluster, containerInstance: ciArn}).promise(); return listResult.taskArns.length; } async function terminateInstance(asgName, lchName, lcaToken) { console.log(`terminateInstance(${[...arguments].join(', ')})`); await asg.completeLifecycleAction({ AutoScalingGroupName: asgName, LifecycleHookName: lchName, LifecycleActionToken: lcaToken, LifecycleActionResult: 'CONTINUE' }).promise(); } async function heartbeat(asgName, lchName, lcaToken) { console.log(`heartbeat(${[...arguments].join(', ')})`); await asg.recordLifecycleActionHeartbeat({ AutoScalingGroupName: asgName, LifecycleHookName: lchName, LifecycleActionToken: lcaToken }).promise(); } exports.handler = async function(event, context) { console.log(`Invoke: ${JSON.stringify(event)}`); const body = JSON.parse(event.Records[0].body); // batch size is 1 if (body.Service === 'AWS Auto Scaling' && body.Event === 'autoscaling:TEST_NOTIFICATION') { console.log('Ignore autoscaling:TEST_NOTIFICATION') } else if (body.Service === 'AWS Auto Scaling' && body.LifecycleTransition === 'autoscaling:EC2_INSTANCE_TERMINATING') { const lcaToken = body.LifecycleActionToken; const ciArn = await getContainerInstanceArn(body.EC2InstanceId); await drainInstance(ciArn); await wait(ciArn, body.AutoScalingGroupName, body.LifecycleHookName, body.LifecycleActionToken, body.Time); } else if (body.Service === 'DrainInstance' && body.Event === 'custom:DRAIN_WAIT') { const taskCount = await countTasks(body.ContainerInstanceArn); if (taskCount === 0) { await terminateInstance(body.AutoScalingGroupName, body.LifecycleHookName, body.LifecycleActionToken); } else { const actionDuration = (Date.now() - new Date(body.TerminateTime).getTime()) / 1000; if (actionDuration < drainingTimeout) { await heartbeat(body.AutoScalingGroupName, body.LifecycleHookName, body.LifecycleActionToken); await wait(body.ContainerInstanceArn, body.AutoScalingGroupName, body.LifecycleHookName, body.LifecycleActionToken, body.TerminateTime); } else { console.log('Timeout for instance termination reached.'); await terminateInstance(body.AutoScalingGroupName, body.LifecycleHookName, body.LifecycleActionToken); } } } else { console.log('Ignore unxpected event'); } }; Handler: 'index.handler' MemorySize: 128 Role: !GetAtt 'DrainInstanceLambdaRole.Arn' Runtime: 'nodejs8.10' Timeout: 30 Environment: Variables: CLUSTER: !Ref Cluster QUEUE_URL: !Ref AutoScalingGroupLifecycleHookQueue DRAINING_TIMEOUT: !Ref DrainingTimeoutInSeconds ReservedConcurrentExecutions: 1 DrainInstanceLogGroup: Type: 'AWS::Logs::LogGroup' Properties: LogGroupName: !Sub '/aws/lambda/${DrainInstanceLambda}' RetentionInDays: !Ref LogsRetentionInDays DrainInstanceLambdaErrorsTooHighAlarm: Condition: HasAlertingModule Type: 'AWS::CloudWatch::Alarm' Properties: AlarmDescription: 'Invocations failed due to errors in the function' Namespace: 'AWS/Lambda' MetricName: Errors Statistic: Sum Period: 60 EvaluationPeriods: 1 ComparisonOperator: GreaterThanThreshold Threshold: 0 AlarmActions: - 'Fn::ImportValue': !Sub '${AlertingModule}-Arn' Dimensions: - Name: FunctionName Value: !Ref DrainInstanceLambda Outputs: ModuleId: Value: 'ecs-cluster-ec2' ModuleVersion: Value: '1.0.0' StackName: Value: !Ref 'AWS::StackName' Arn: Value: !GetAtt 'Cluster.Arn' Export: Name: !Sub '${AWS::StackName}-Arn' Name: Value: !Ref Cluster Export: Name: !Sub '${AWS::StackName}-Name' SecurityGroupId: Description: 'The Security Group Id of ECS cluster instances.' Value: !Ref SecurityGroup Export: Name: !Sub '${AWS::StackName}-SecurityGroupId' LogGroup: Description: 'Log group of ECS cluster.' Value: !Ref LogGroup Export: Name: !Sub '${AWS::StackName}-LogGroup'
I want donate a project about this feature. What do i can?
Thanks for raising this feature request. Would you please contact us at hello@widdix.net to discuss how to sponsor this feature?
How to make an ECS Cluster from EC2 and ASG? I hope a module like
https://github.com/widdix/aws-cf-templates/blob/master/ecs/cluster.yaml
Current, i am using below code