aws-samples / amazon-cloudwatch-auto-alarms

Automatically create and configure Amazon CloudWatch alarms for EC2 instances, RDS, and AWS Lambda using tags for standard and custom CloudWatch Metrics.
MIT No Attribution
91 stars 85 forks source link

Errors when creating alarms #18

Closed traveltek-tmalek closed 2 years ago

traveltek-tmalek commented 2 years ago

Hi, can you please help me? I have followed the documentation closely, yet I run into two errors. One is this:

[ERROR] 2022-07-17T13:41:04.941Z 5e79d7d4-a54c-4fc0-8176-8acb4d227ad6 Error deleting alarms for i-07b880f05f4b10d42!: An error occurred (AccessDenied) when calling the DeleteAlarms operation: User: arn:aws:sts::502937263541:assumed-role/amazon-cloudwatch-auto-al-CloudWatchAutoAlarmLambd-HOTJT0RIB2HM/CloudWatchAutoAlarms is not authorized to perform: cloudwatch:DeleteAlarms on resource: arn:aws:cloudwatch:ca-central-1:502937263541:alarm: because no identity-based policy allows the cloudwatch:DeleteAlarms action But that is the more minor issue, because Alarms don't get created to begin with anyway. The main problem is this is happening:

[ERROR] KeyError: 'Description' Traceback (most recent call last):   File "/var/task/cw_auto_alarms.py", line 145, in lambda_handler     process_alarm_tags(instance_id, instance_info, default_alarms, metric_dimensions_map, sns_topic_arn,   File "/var/task/actions.py", line 191, in process_alarm_tags     platform = determine_platform(ImageId)   File "/var/task/actions.py", line 232, in determine_platform     if 'ubuntu' in image_info['Images'][0]['Description'].lower() or 'ubuntu' in image_info['Images'][0][

It seems like there's errors when trying to create the alarms. It happens each time the Lambda function is trying to create one.

Also seeing entries such as these in the logs:

[ERROR] 2022-07-17T14:58:51.906Z 2acc20db-165e-4cc9-a729-9f47454bb6aa Failure describing image ami-0ab0f3079b6bb9ec1: 'Description'

Any ideas what is going on here?

Thanks.

traveltek-tmalek commented 2 years ago

Interesting. Before, the AMI had no description. It was blank. So I added one. Now the following shows in the logs:

[INFO] 2022-07-17T16:02:54.072Z e50090d3-7aa9-4794-b61c-3d3caea5acd2 Found credentials in environment variables.

[INFO] 2022-07-17T16:02:56.170Z e50090d3-7aa9-4794-b61c-3d3caea5acd2 ImageId is: ami-0ab0f3079b6bb9ec1

[ERROR] 2022-07-17T16:02:56.529Z e50090d3-7aa9-4794-b61c-3d3caea5acd2 Failure creating alarm: list index out of range

[ERROR] IndexError: list index out of range Traceback (most recent call last):   File "/var/task/cw_auto_alarms.py", line 145, in lambda_handler     process_alarm_tags(instance_id, instance_info, default_alarms, metric_dimensions_map, sns_topic_arn,   File "/var/task/actions.py", line 198, in process_alarm_tags     create_alarm_from_tag(instance_id, instance_tag, instance_info, metric_dimensions_map, sns_topic_arn, alarm_separator)   File "/var/task/actions.py", line 118, in create_alarm_from_tag     namespace = alarm_properties[1]

traveltek-tmalek commented 2 years ago

This also shows in the logs before the List index out of range error:

[INFO] 2022-07-17T16:02:56.510Z e50090d3-7aa9-4794-b61c-3d3caea5acd2 Platform is: Amazon Linux

This didn't appear before. It seems like it's getting a little further. But there's still something it can't find from the AMI?

traveltek-tmalek commented 2 years ago

After the last errors I pasted, I fixed it by removing the tag:

AutoAlarm: AWS/EC2-StatusCheckFailed-GreaterThanThreshold-5m-Average

That was part of my ASG. Now it all appears to be working. Just need to tweak things a bit and get more familiar with the tool but I'll mark this as closed.