The CloudWatchAutoAlarms AWS Lambda function enables you to quickly and automatically create a standard set of CloudWatch alarms for your Amazon EC2 instances or AWS Lambda functions using tags. It prevents errors that may occur by manually creating alarms, reduces the time required to deploy alarms, and reduces the skills gap required in order to create and manage alarms. It can be especially useful during a large migration to AWS where many resources may be migrated into your AWS account at once.
The solution supports multiple accounts with AWS Organizations and multiple regions.
The default configuration creates alarms for the following Amazon EC2 metrics for Windows, Amazon Linux, Redhat, Ubuntu, or SUSE EC2 instances:
The default configuration creates alarms for the following AWS RDS metrics:
Alarms are created for RDS clusters as well as RDS database instances.
The default configuration also creates alarms for the following AWS Lambda metrics:
You can change or add alarms by updating the default_alarms dictionary in cw_auto_alarms.py.
The created alarms can be configured to notify an Amazon SNS topic that you specify using the DEFAULT_ALARM_SNS_TOPIC_ARN environment variable. See the Setup section for details.
The Amazon CloudWatch alarms are created when an EC2 instance with the tag key Create_Auto_Alarms enters the running state and they are deleted when the instance is terminated. Alarms can be created when an instance is first launched or afterwards by stopping and starting the instance.
The alarms are created and configured based on EC2 tags which include the metric name, comparison, period, statistic, and threshold.
The tag name syntax for AWS provided metrics is:
AutoAlarm-\<Namespace>-\<MetricName>-\<ComparisonOperator>-\<Period>-\<EvaluationPeriods>-\<Statistic>-\<Description>
Where:
The tag value is used to specify the threshold. You can also create alarms for custom Amazon CloudWatch metrics.
For example, one of the preconfigured, default alarms that are included in the default_alarms dictionary is AutoAlarm-AWS/EC2-CPUUtilization-GreaterThanThreshold-5m-1-Average-Created_by_CloudWatchAutoAlarms. When an instance with the tag key Create_Auto_Alarms enters the running state, an alarm for the AWS provided CPUUtilization CloudWatch EC2 metric will be created. Additional alarms will also be created for the EC2 instance based on the platform and alarms defined in the default_alarms python dictionary defined in cw_auto_alarms.py.
Alarms can be updated by changing the tag key or value and stopping and starting the instance.
There are a number of settings that can be customized by updating the CloudWatchAutoAlarms Lambda function environment variables defined in the CloudWatchAutoAlarms.yaml CloudFormation template. The settings will only affect new alarms that you create so you should customize these values to meet your requirements before you deploy the Lambda function. The following list provides a description of the setting along with the environment variable name and default value:
You can update the thresholds for the default alarms by updating the following environment variables:
For Anomaly Detection Alarms:
For Amazon EC2:
For AWS RDS:
For AWS Lambda:
AWS Organizations support is required for multi-account with a single AWS Lambda Function deployment.
Deployment is supported for:
Follow the instructions below. Specify the AWS Organization related parameters for multi-account support with AWS Organizations.
The CloudWatchAutoAlarms-Config-SNS.yaml stack creates an SNS topic for alarm notifications configured by the CloudWatchAutoAlarms Lambda function. It allows single-account or multi-account AWS Organizations support. If you want to support SNS notifications for the created alarms, you can deploy the template in each region you will use the solution.
o-xxxxx
).Follow these steps for each region you want to support.
Log in to the AWS Management Console
Navigate to the CloudFormation Console
Create a New Stack
Upload the CloudFormation Template
CloudWatchAutoAlarms-SNS.yaml
file.Specify Stack Details
cloudwatch-auto-alarms-sns
.o-xxxxx
). Leave blank for single-account deployment.Configure Stack Options
Review the Configuration
Deploy the Stack
Verify the Deployment
TargetOrganizationId
parameter, the stack sets up permissions for CloudWatch to publish alarms across all accounts in the specified AWS Organization.cloudwatch-auto-alarms-s3
.I acknowledge that AWS CloudFormation might create IAM resources
.LambdaDeploymentBucketName
to confirm the S3 bucket name. The bucket will follow the naming pattern:
cloudwatch-auto-alarms-<AWS_Account_ID>-<AWS_Region>
Take note of the deployment bucket name, it will be used in the next steps.
This bucket is now ready to support Lambda deployment package storage for the CloudWatchAutoAlarms Lambda function. This bucket will be used in the next step for uploading the deployment packages as required.
cw_auto_alarms.py
in your project directory. This file contains the logic for creating and managing CloudWatch alarms.src
folder of the project.zip -j amazon-cloudwatch-auto-alarms.zip src/*
-j
flag ensures that only the files are zipped, without preserving the folder structure.amazon-cloudwatch-auto-alarms.zip
in the current directory.cloudwatch-auto-alarms-<AWS_Account_ID>-<AWS_Region>
cloudwatch-auto-alarms-s3
CloudFormation stack.amazon-cloudwatch-auto-alarms.zip
file from your local machine.amazon-cloudwatch-auto-alarms.zip
file appears in the bucket.You will use the bucket name and file path (key) as parameters when deploying the cloudwatch-auto-alarms
stack. These values are:
cloudwatch-auto-alarms-123456789012-us-east-1
).amazon-cloudwatch-auto-alarms.zip
).Now that the deployment package has been uploaded, proceed to the next step to deploy the CloudWatchAutoAlarms stack using the uploaded Lambda deployment package.
amazon-cloudwatch-auto-alarms.zip
).Log In to the AWS Management Console
Go to CloudFormation
Create a New Stack
Upload the Template
CloudWatchAutoAlarms.yaml
).Specify Stack Details
cloudwatch-auto-alarms
).ENABLED
to activate CloudWatch Events immediately or DISABLED
to activate them later.amazon-cloudwatch-auto-alarms.zip
).true
to enable SNS notifications for alarms or false
to disable them. If you selext true, you must enter the SNSTopicName and SNSTopicAccount information. The topic must exist in each region you wil support.CloudWatchAutoAlarmsSNSTopic
).AutoAlarm
).Configure Stack Options
Review and Deploy
Monitor Deployment
Verify Deployment
CloudWatchAutoAlarms
is created.TargetRegions
parameter specifies the desired regions.CloudWatchAutoAlarms
will automatically create alarms based on configured parameters.To enable multi-account support with AWS Organizations, several CloudFormation templates must be deployed to set up cross-account event routing, IAM roles, and AWS Organizations integration. These steps ensure the CloudWatchAutoAlarms AWS Lambda function can manage alarms across multiple AWS accounts within your organization.
StackSet Name: Enter a descriptive name, such as cloudwatch-auto-alarms-events
.
Parameters:
arn:aws:lambda:<region>:<account-id>:function:CloudWatchAutoAlarms
Where region
and account-id
is the AWS region and AWS account id you deployed the CloudWatchAutoAlarm AWS Lambda function.arn:aws:events:<region>:<account-id>:event-bus/default
where region
and account-id
is the AWS region and AWS account id you deployed the CloudWatchAutoAlarm AWS Lambda function. Click Next.
Accounts:
Regions:
Click Next.
Execution Role:
Automatic Deployment:
Click Next.
This step involves deploying a CloudFormation template to create a cross-account IAM role. This role is used by the CloudWatchAutoAlarms AWS Lambda Function to manage alarms across accounts within AWS Organizations. The deployment is restricted to one AWS region per target account because IAM roles are global resources.
Click the Create StackSet button.
Under Choose a template, select Upload a template file.
Click Choose file, and upload the provided CloudFormation template file:
Click Next to proceed.
Enter a meaningful StackSet name, such as cloudwatch-auto-alarms-crossaccountrole
.
Provide values for the template parameters:
AutoAlarm
).Click Next.
I acknowledge that AWS CloudFormation might create IAM resources with custom names.
CloudWatchAutoAlarmCrossAccountRole
.This role is required to enable the CloudWatchAutoAlarms AWS Lambda function to retrieve the list of AWS accounts under your AWS Organization. The role must be deployed in the AWS Organizations Management Account.
CloudWatchAutoAlarms-ManagementAccountRole.yaml
file.cloudwatch-auto-alarms-management-role
.Project
, Value: CloudWatchAutoAlarms
Environment
, Value: Production
I acknowledge that AWS CloudFormation might create IAM resources
.CloudWatchAutoAlarmManagementAccountRole
and is a global IAM resource, meaning it does not need to be deployed to multiple regions.In order to create the default alarm set for an Amazon EC2 instance or AWS Lambda function, you simply need to tag the Amazon EC2 instance or AWS Lambda function with the activation tag key defined by the ALARM_TAG environment variable. The default tag activation key is Create_Auto_Alarms.
For Amazon EC2 instances, you must add this tag during instance launch or you can add this tag at any time to an instance and then stop and start the instance in order to create the default alarm set as well as any custom, instance specific alarms.
You can also manually invoke the CloudWatchAutoAlarms lambda function with the following event payload to create / update EC2 alarms without having to stop and start your EC2 instances:
{
"action": "scan"
}
You can do this with a test execution of the CloudWatchAUtoAlarms AWS Lambda function. Open the AWS Lambda Management Console and perform a test invocation from the Test tab with the payload provided here.
The CloudWatchAutoAlarms.yaml template includes two CloudWatch event rules. One invokes the Lambda function on running
and terminated
instance states. The other invokes the Lambda function on a daily schedule. The daily scheduled event will update any existing alarms and also create any alarms with wildcard tags.
EC2 instances must have the CloudWatch agent installed and configured with the basic, standard, or advanced predefined metric sets in order for the default alarms for custom CloudWatch metrics to work. Scripts named userdata_linux_basic.sh, userdata_linux_standard.sh, and userdata_linux_advanced.sh are provided to install and configure the CloudWatch agent on Linux based EC2 instances with their respective predefined metric sets.
For Amazon RDS, you can add this tag to an RDS database cluster or database instance at any time in order to create the default alarm set as well as any custom alarms that have been specified as tags on the cluster or instance.
For AWS Lambda, you can add this tag to an AWS Lambda function at any time in order to create the default alarm set as well as any custom, function specific alarms.
You can define an Amazon Simple Notification Service (Amazon SNS) topic that the Lambda function will specify as the notification target for created alarms. The deployment instructions include an SNS topic that you can deploy and use with the solution. You should deploy the SNS topic to each region that you want to support with this solution. Amazon CloudWatch Alarms can't send notifications to SNS topics located in different regions.
The solution also enables you to specify a unique SNS topic per AWS resource by setting a tag with key notify
and the value set to the SNS topic ARN that should be targeted for alarms for that specific resource. For any resources that don't have the notify
tag set, the default SNS topic ARN will be used.
You can apply a tagging strategy that includes the notify
tag for groups of resources to notify on specific groups of resources. For example, consider a tag with key Team
and value Windows
. You could align tagging of this specific key / value with the SNS topic for Windows support(e.g. notify
: arn:aws:sns:us-east-1:123456789012:WindowsSupport)
You can add, remove, and customize alarms in the default alarm set. The default alarms are defined in the default_alarms python dictionary in cw_auto_alarms.py.
In order to create an alarm, you must uniquely identify the metric that you want to alarm on. Standard Amazon EC2 metrics include the InstanceId dimension to uniquely identify each standard metric associated with an EC2 instance. If you want to add an alarm based upon a standard EC2 instance metric, then you can use the tag name syntax: AutoAlarm-AWS/EC2-\<MetricName>-\<ComparisonOperator>-\<Period>-\<EvaluationPeriods>-\<Statistic>-\<Description> This syntax doesn't include any dimension names because the InstanceId dimension is used for metrics in the AWS/EC2 namespace. These AWS provided EC2 metrics are common across all platforms for EC2.
Similarly, AWS Lambda metrics include the FunctionName dimension to uniquely identify each standard metric associated with an AWS Lambda function. If you want to add an alarm based upon a standard AWS Lambda metric, then you can use the tag name syntax: AutoAlarm-AWS/Lambda-\<MetricName>-\<ComparisonOperator>-\<Period>-\<EvaluationPeriods>-\<Statistic>-\<Description> You can add any standard Amazon CloudWatch metric for Amazon EC2 or AWS Lambda into the default_alarms dictionary under the AWS/EC2 or AWS/Lambda dictionary key using this tag syntax.
The solution allows you to specify a wildcard for a dimension value in order to create CloudWatch alarms for all dimension values. This is particularly useful for creating alarms for all partitions and drives on a system or where the value of a dimension is not known or can vary across EC2 instances.
For example, the CloudWatch agent publishes the disk_used_percent
metric for disks attached to a Linux EC2 instance. The dimensions for this metric for Amazon Linux are device name
, fstype
, and path
.
The alarm tag for this metric is hardcoded in the default_alarms
python dictionary in cw_auto_alarms.py
to create an alarm for the root volume whose default dimensions and values are:
this is equivalent to the following default tag in the solution:
AutoAlarm-CWAgent-disk_used_percent-device-nvme0n1p1-fstype-xfs-path-/-GreaterThanThreshold-5m-1-Average-Created_by_CloudWatchAutoAlarms
If you want to alarm on all disks attached to an EC2 instance then you must specify the device name, file system type, and path dimension values for each disk, which will vary. Each EC2 instance may also have a different number of disks and different dimension values.
The solution addresses this requirement by allowing you to specify a wildcard for the dimension value. For example, the Alarm tag for disk_used_percent
For Amazon Linux specified in the default_alarms
dictionary would change to:
{
'Key': alarm_separator.join(
[alarm_identifier, cw_namespace, 'disk_used_percent', 'device', '*', 'fstype', 'xfs', 'path',
'*', 'GreaterThanThreshold', default_period, default_evaluation_periods, default_statistic,
'Created_by_CloudWatchAutoAlarms']),
'Value': alarm_disk_used_percent_threshold
},
This yields the equivalent alarm tag:
AutoAlarm-CWAgent-disk_used_percent-device-*-fstype-xfs-path-*-GreaterThanThreshold-5m-1-Average-Created_by_CloudWatchAutoAlarms
In this example, we have specified a wildcard for the device
and path
dimensions. Using this example, the solution will query CloudWatch metrics and create an alarm for each unique device and path dimension values for each Amazon Linux instance.
If your EC2 instance had two disks with the following dimensions:
Disk 1
Disk 2
Then two alarms would be created using a *
wildcard for the device
and path
dimensions:
In order to identify the dimension values, the solution queries CloudWatch metrics to identify all metrics that match the fixed dimension values for the metric name specified. It then iterates through the dimensions whose values are specified as a wildcard to identify the specific dimension values required for the alarm.
Because the solution relies on the available metrics in CloudWatch, it will only work after the CloudWatch agent has published and sent metrics to the CloudWatch service. Since the solution is designed to run on instance launch, these metrics will not be available on first start since the CloudWatch service will not have received them yet.
In order to resolve this, you should schedule the solution to run on schedule using the scan
payload:
{
"action": "scan"
}
This will provide sufficient time for the CloudWatch agent to publish metrics for new instances. You can schedule the frequency of execution based on the acceptable timeframe for which wildcard based alarms for new instances are not yet created.
CloudWatch Anomaly Detection Alarms are supported using the comparison operators LessThanLowerOrGreaterThanUpperThreshold
, LessThanLowerThreshold
, or GreaterThanUpperThreshold
.
When you specify one of these comparison operators, the solution creates an anomaly detection alarm and uses the value for the tag key as the threshold. Refer to the CloudWatch documentation for more details on the threshold and anomaly detection.
CloudWatch Anomaly detection uses machine learning models based on the metric, dimensions, and statistic chosen. If you create an alarm without a current model, CloudWatch Alarms creates a new model using these parameters from your alarm configuration.
For new models, it can take up to 3 hours for the actual anomaly detection band to appear in your graph. It can take up to two weeks for the new model to train, so the anomaly detection band shows more accurate expected values. Refer to the documentation for more details.
The solution includes commented out code for creating a CloudWatch Anomaly Detection Alarm for CPU Utilization in the default_alarms
dictionary:
# This is an example alarm using anomaly detection
# {
# 'Key': alarm_separator.join(
# [alarm_identifier, 'AWS/EC2', 'CPUUtilization', 'GreaterThanUpperThreshold', default_period,
# default_evaluation_periods, default_statistic, 'Created_by_CloudWatchAutoAlarms']),
# 'Value': alarm_cpu_high_anomaly_detection_default_threshold
# }
You can uncomment and update this code to test out anomaly detection support.
The solution implements the environment variable ALARM_DEFAULT_ANOMALY_THRESHOLD
as an example threshold you can use for your anomaly detection alarms.
Metrics captured by the Amazon CloudWatch agent are considered custom metrics. These metrics are created in the CWAgent namespace by default. Custom metrics may have any number of dimensions in order to uniquely identify a metric. Additionally, the metric dimensions may be named differently based upon the underlying platform for the EC2 instance.
For example, the metric name used to measure the disk space utilization is named disk_used_percent in Linux and LogicalDisk % Free Space in Windows. The dimensions are also different, in Linux you must also include the device, fstype, and path dimensions in order to uniquely identify a disk. In Windows, you must include the objectname and instance dimensions.
Consequently, it is more difficult to automatically create alarms across different platforms for custom CloudWatch EC2 instance metrics.
The disk_used_percent metric for Linux has the additional dimensions: \'device', 'fstype', 'path'. For metrics with custom dimensions, you can include the dimension name and value in the tag key syntax: AutoAlarm-\<Namespace>-\<MetricName>-\<DimensionName-DimensionValue...>-\<ComparisonOperator>-\<Period>-\<EvaluationPeriods>-\<Statistic>-\<Description> For example, the tag name used to create an alarm for the average disk_used_percent over a 5 minute period for the root partition on an Amazon Linux instance in the CWAgent namespace is: AutoAlarm-CWAgent-disk_used_percent-device-xvda1-fstype-xfs-path-/-GreaterThanThreshold-5m-1-Average-exampleDescription Where the device dimension has a value of xvda1, the fstype dimension has a value of xfs, and the path dimension has a value of /.
This syntax and approach allows you to collectively support metrics with different numbers of dimensions and names. Using this syntax, you can add alarms for metrics with custom dimensions to the appropriate platform in the default_alarms dictionary in cw_auto_alarms.py
You should also make sure that the CLOUDWATCH_APPEND_DIMENSIONS environment variable is set correctly in order to ensure that created alarms include these dimensions. The lambda function will dynamically lookup the values for these dimensions at runtime.
If your dimensions name uses the default separator character '-', then you can update the alarm_separator variable in cw_auto_alarms.py with an alternative seperator character such as '~'.
You can create alarms that are specific to an individual EC2 instance by adding a tag to the instance using the tag key syntax described in changing the default alarm set. Simply add a tag to the instance on launch or restart the instance after you have added the tag. You can also update the thresholds for created alarms by updating the tag values, causing the alarm to be updated when the instance is stopped and started.
For example, to add an alarm for the Amazon EC2 StatusCheckFailed CloudWatch metric for an existing EC2 instance:
You can create alarms that are specific to an individual AWS Lambda function by adding a tag to the instance using the tag key syntax described in changing the default alarm set.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.