aws-solutions / aws-crr-monitor

A solution for near real-time monitoring of replication of objects in Amazon S3 between a source bucket and a destination bucket across multiple regions.
https://aws.amazon.com/solutions/cross-region-replication-monitor
Apache License 2.0
39 stars 16 forks source link

CRRMonitorHousekeeping Lambda error when calling PutMetricData #9

Closed p-doyle closed 3 years ago

p-doyle commented 3 years ago

Errors seem to have started on 2020-02-18. This is the error message from cloudwatch logs:

Screenshot_340

I updated the stack to the latest from S3 and still getting the error. Any idea what the issue might be or how to resolve?

Thanks!

ericquinones commented 3 years ago

Hi there- Apologies for the delay. I traced the error you shared through the source code and saw that after a successful CloudWatch put_metric_data operation, the CRRMonitorHousekeeping will delete the item from the Statistics table that it just reported on (link).

Since CloudWatch throws that exception and it is re-raised here, the function does reach the code where the item is deleted from the Statistics table.

If this issue is still occurring, can you try adding some logging to the Lambda function to output the key of the item from the Statistics table? If it is the same item each time the exception is thrown, it looks like the data in that table is not being cleaned out as the function normally would.

There may have been a separate error back when this originally started that prevented the CloudWatch metric from being placed or the item to be deleted. Maybe looking through CloudWatch logs from around the time of the timestamp for the oldest item in that table will have more information. At this point, CloudWatch appears to be rejecting the put_metric_data because that timestamp is too stale.

p-doyle commented 3 years ago

Ahh okay... I re-launched it and haven't got the error again so hopefully that fixed it. Thanks!