Open gdd1984 opened 2 years ago
I also had the same issue. It seems clear that there is a problem between the SSM Agent and CloudWatch.
In the case I checked, I confirmed that the ssm-session-worker
process with the session terminated does not die and is still running. Maybe it's because the PutLogEvent keeps failing.
For those of you who have come to this issue: Cleaning up the old ssm-session-worker
process will temporarily resolve the issue. Essentially, AWS should solve this problem.
This is now resolved by updating to the latest available agent version - as per https://github.com/aws/amazon-ssm-agent/commit/1b26ecfe47f2b53aeacb12c0f3a3c509fe0cca8d
The event message length threshold was lowered was because events were being inflated by json.Marshal
, and this inflation could be up to 6 times the size of the original event message. By reducing the threshold to 1/6th the original size the issue appears to be resolve as this function was the leading cause to the "Log event too large” errors.
I have an EC2 instance that is showing about 1000x on network traffic out for the last 2 weeks. Narrowed it dows to ssm ahent repeatedly trying to do a PutLogEvent that is too large (Log shown below).
Is there some way to determine what is generating the large log; or restart the ssm agent and make log event get tracked from restart point onwards. Have tried restarting but has no affect. Keeps trying to send these PutLogEvents.