aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.04k stars 322 forks source link

ssm-session-worker connection failed with "too many open files" #398

Open joosangkim opened 2 years ago

joosangkim commented 2 years ago

Hi, I'm trying to create connection between circleci and ec2 via aws ssm session worker. The application on circleci machine generated 50 grpc connection in its connection pool. Also EC2 machine with c5.xlarge spec.

I made a ssm session with below command.

aws ssm start-session --target $DEV_EC2 --document-name AWS-StartPortForwardingSession \
--parameters "portNumber"=["3001"],"localPortNumber"=["4001"] \
--region $AWS_REGION 

Session manager log from circleci

Starting session with SessionId: xxx-0fd33531787bc8df4
Port 4001 opened for sessionId xxx-0fd33531787bc8df4.
Waiting for connections...

Connection accepted for session [xxx-0fd33531787bc8df4]

Connection to destination port failed, check SSM Agent logs.

Connection to destination port failed, check SSM Agent logs.

Connection to destination port failed, check SSM Agent logs.

Connection to destination port failed, check SSM Agent logs.

Connection to destination port failed, check SSM Agent logs.
Cannot perform start session: EOF

ssm-seesion-worker logs from EC2 were repeated.

2021-08-06 09:21:33 ERROR [ssm-session-worker] [xxx-0fd33531787bc8df4] [DataBackend] [pluginName=Port] Unable to dial connection to server: dial tcp :3001: socket: too many open files

I already increased fd limit for ssm-session-worker in ec2 to its max.

cat /etc/security/limits.conf

* hard nofile 500000
* soft nofile 500000
* hard nproc 999999
* soft nproc 999999
yuting-fan commented 2 years ago

Hi joosangkim@,

How many open connections did you have on your EC2 instance when you experienced this error? We can start from there to see whether any of the open connections were not in use and should be recycled.

Setting the open file limits in /etc/security/limits.conf won't work for your case. /etc/security/limits.conf is a configuration file for Linux PAM authentication. It sets limits for logged in users, not system processes. The ssm-user is a system-generated user that lets you use Session Manager as an administrative tool to manage your instance; it is not used as a login tool to establish SSH connection, as mentioned in this documentation https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager.html

That being said, if all of the open connections are needed and if you do want to apply the open file limits, you would need to apply it on amazon-ssm-agent service directly and reload all the daemon threads, followed by restarting amazon-ssm-agent, in order to apply the limits from /etc/security/limits.conf.

Please let us know if this answers your question.

Thanks, Yuting