aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.05k stars 323 forks source link

Feature: OOM killer protection similar to sshd #580

Open erpel opened 1 month ago

erpel commented 1 month ago

I initially opened this as part of amaonlinux, but it makes more sense in this project:

When the system is experiencing memory pressure, I've seen many times that ssm-agent gets killed by the OOM killer. This makes it hard to debug the situation if ssm-agent being killed results in being unable to log in and observe the situation.

I'd like ssm-agent to be run with the same OOM killer protections that sshd applies to it's own process (oom score adjustment -1000).

Alternatives would be to stop using SSM for login and switch to SSH, but this puts additional overhead on us, administering user accounts and ssh keys. SSM session manager is a useful feature that would really benefit from added efforts to increase stability.

This old bug https://bugzilla.redhat.com/show_bug.cgi?id=1010429#c0 contains some details about how it used to work with sshd - especially making sure that user processes spawned by the "protected" server don't inherit the strict protection of oom_score_adj -1000.

CuriousDolphin commented 2 days ago

I have the same problem, when the machine is saturated with ram, the ssm agent is killed and the only way is to restart it, is there any news?