aws-samples / aws-kms-xks-proxy

AWS KMS External Keystore (XKS) Proxy reference implementation
Apache License 2.0
35 stars 11 forks source link

Support configurable interval to send TCP keepalive probes + Change to use port 80 as sample instead of port 8000 + Upgrade to v2.0.1 #4

Closed hansonchar closed 2 years ago

hansonchar commented 2 years ago

Issue #, if available:

I recently observed the following error when running the xks-proxy against a CloudHSM:

ERROR                 main hyper::server::tcp: accept error: Too many open files (os error 24)

Using lsof, I saw

sudo lsof -p 106586 | grep TCP | wc -l
1006

sudo lsof -p 106586 | grep TCP | grep ESTABLISHED | wc -l
1005

Apparently, the tcp keep-alive timeout is disabled by default in hyper:

https://github.com/hyperium/hyper/blob/0.14.x/src/server/tcp.rs#L66-L74

Description of changes:

  1. Support configurable interval to send TCP keepalive probes
  2. Change to use port 80 as sample instead of port 8000
  3. Upgrade to v2.0.1

Testing:

Testing with a 60 second interval to send TCP keepalive probes is in progress.

2022-09-12T00:14:47.563391Z  INFO main xks_proxy: v2.0.1 listening on 0.0.0.0:443
2022-09-12T00:14:47.564184Z  INFO main xks_proxy: TCP keepalive interval is set to 60 seconds

Will see if this change actually fixes the issue by running the patched server on the host that exhibited this failure before for a day or two.

watch `sudo lsof -p 129360 | grep TCP | grep ESTABLISHED | wc -l && sudo lsof -p 129360 | grep TCP | sort -k9 | grep ESTABLISHED`

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

hansonchar commented 2 years ago

This issue is related to axum-server: Give more control on connection tasks #29