aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.03k stars 323 forks source link

SSM Agent Stuck trying to PutLogEvents that are too Large #452

Open gdd1984 opened 2 years ago

gdd1984 commented 2 years ago

I have an EC2 instance that is showing about 1000x on network traffic out for the last 2 weeks. Narrowed it dows to ssm ahent repeatedly trying to do a PutLogEvent that is too large (Log shown below).

Is there some way to determine what is generating the large log; or restart the ssm agent and make log event get tracked from restart point onwards. Have tried restarting but has no affect. Keeps trying to send these PutLogEvents.

2022-06-27 00:36:30 INFO [ssm-session-worker] [1655684264997640000-038f0efd24fc649f0] [DataBackend] [pluginName=Standard_Stream] Calling Get Sequence token
2022-06-27 00:36:30 INFO [ssm-session-worker] [1655684264997640000-038f0efd24fc649f0] [DataBackend] [pluginName=Standard_Stream] Received Sequence token
2022-06-27 00:36:31 ERROR [ssm-session-worker] [1655684264997640000-038f0efd24fc649f0] [DataBackend] [pluginName=Standard_Stream] error when calling AWS APIs. error details - InvalidParameterException: Log event too large: 382021 bytes exceeds limit of 262144
2022-06-27 00:36:31 INFO [ssm-session-worker] [1655684264997640000-038f0efd24fc649f0] [DataBackend] [pluginName=Standard_Stream] increasing error count by 1
2022-06-27 00:36:31 ERROR [ssm-session-worker] [1655684264997640000-038f0efd24fc649f0] [DataBackend] [pluginName=Standard_Stream] Error in PutLogEvents:InvalidParameterException: Log event too large: 382021 bytes exceeds limit of 262144
2022-06-27 00:36:31 WARN [ssm-session-worker] [1655684264997640000-038f0efd24fc649f0] [DataBackend] [pluginName=Standard_Stream] Failed to upload message to CloudWatch, err: InvalidParameterException: Log event too large: 382021 bytes exceeds limit of 262144
EcmaXp commented 1 year ago

I also had the same issue. It seems clear that there is a problem between the SSM Agent and CloudWatch.

In the case I checked, I confirmed that the ssm-session-worker process with the session terminated does not die and is still running. Maybe it's because the PutLogEvent keeps failing.

For those of you who have come to this issue: Cleaning up the old ssm-session-worker process will temporarily resolve the issue. Essentially, AWS should solve this problem.

theycallmemac commented 1 year ago

This is now resolved by updating to the latest available agent version - as per https://github.com/aws/amazon-ssm-agent/commit/1b26ecfe47f2b53aeacb12c0f3a3c509fe0cca8d

The event message length threshold was lowered was because events were being inflated by json.Marshal, and this inflation could be up to 6 times the size of the original event message. By reducing the threshold to 1/6th the original size the issue appears to be resolve as this function was the leading cause to the "Log event too large” errors.