buildkite / elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC
https://buildkite.com/docs/quickstart/elastic-ci-stack-aws
MIT License
417 stars 270 forks source link

Configure OOM killer in systemd to preserve the agent process #882

Open keithduncan opened 3 years ago

keithduncan commented 3 years ago

We want the agent process to remain alive but the bootstrap and any processes under it in the systemd service should be candidates for being killed under memory pressure 🤔

n-tucker commented 2 years ago

We're also experiencing this when instances running in our elastic stack run low on memory. This leaves our instances in a zombie state as they seem to occupy capacity in our autoscaling group without processing any jobs. Sometimes the instances are cleaned up automatically but we've seen instances alive for close to a week with no agents.

While we can reduce the number of agents per instance and/or increase the instance class we'd like the agents to stay alive in some form, so bumping this issue! 😄