cloudfoundry / app-autoscaler-release

Automated scaling for apps running on Cloud Foundry
Apache License 2.0
27 stars 52 forks source link

High file descriptor usage in Scalingengine #814

Closed donacarr closed 2 years ago

donacarr commented 2 years ago

Hi @KevinJCross @silvestre

after we updated autoscaler from 5.3.2 to 6.0.0 we have scalingengine process in trouble...

{"data":{"appId":"1ded89cf-d7a4-4ee5-9727-e4a69f01ebba","end":-1,"error":"dial tcp 168.1.44.23:5432: socket: too many open files","includeAll":false,"order":0,"session":"7.1.14138","start":0},"log_level":2,"log_time":"2022-08-09T14:06:39Z","message":"scalingengine.http-server.scaling-handler.get-scaling-histories.failed-to-retrieve-histories","source":"scalingengine","timestamp":"1660053999.863351345"}

It seems that exceeded the 1024 of opened files (currently this is the default limit we have in our bosh VM) Looking the below output I guess is due to orphaned sockets....

lsof -p 11094 | wc -l
1028

ls -ltr /proc/11094/fd
total 0
lrwx------ 1 root root 64 Aug  8 09:48 63 -> 'socket:[159840]'
lrwx------ 1 root root 64 Aug  8 09:49 9 -> 'socket:[58199]'
lrwx------ 1 root root 64 Aug  8 09:49 8 -> 'socket:[115102]'
lrwx------ 1 root root 64 Aug  8 09:49 7 -> 'socket:[123726]'
l-wx------ 1 root root 64 Aug  8 09:49 6 -> 'pipe:[62443]'
lr-x------ 1 root root 64 Aug  8 09:49 5 -> 'pipe:[62443]'
lrwx------ 1 root root 64 Aug  8 09:49 4 -> 'anon_inode:[eventpoll]'
lrwx------ 1 root root 64 Aug  8 09:49 3 -> 'socket:[141881]'
l-wx------ 1 root root 64 Aug  8 09:49 2 -> /var/vcap/data/sys/log/scalingengine/scalingengine.stderr.log
lrwx------ 1 root root 64 Aug  8 09:49 12 -> 'socket:[124697]'
lrwx------ 1 root root 64 Aug  8 09:49 11 -> 'socket:[874987]'
lrwx------ 1 root root 64 Aug  8 09:49 10 -> 'socket:[62465]'
l-wx------ 1 root root 64 Aug  8 09:49 1 -> /var/vcap/data/sys/log/scalingengine/scalingengine.stdout.log
lrwx------ 1 root root 64 Aug  8 09:49 0 -> /dev/null
lrwx------ 1 root root 64 Aug  8 09:50 17 -> 'socket:[118111]'
lrwx------ 1 root root 64 Aug  8 09:50 16 -> 'socket:[121501]'
lrwx------ 1 root root 64 Aug  8 09:51 15 -> 'socket:[118107]'
lrwx------ 1 root root 64 Aug  8 09:52 24 -> 'socket:[124579]'
lrwx------ 1 root root 64 Aug  8 09:52 23 -> 'socket:[133589]'
lrwx------ 1 root root 64 Aug  8 09:52 14 -> 'socket:[108201]'
lrwx------ 1 root root 64 Aug  8 09:52 13 -> 'socket:[130105]'
lrwx------ 1 root root 64 Aug  8 09:53 27 -> 'socket:[143588]'
lrwx------ 1 root root 64 Aug  8 09:53 19 -> 'socket:[127831]'
lrwx------ 1 root root 64 Aug  8 10:05 22 -> 'socket:[127343]'
lrwx------ 1 root root 64 Aug  8 10:05 21 -> 'socket:[147858]'
lrwx------ 1 root root 64 Aug  8 10:05 20 -> 'socket:[124597]'
lrwx------ 1 root root 64 Aug  8 10:05 18 -> 'socket:[118067]'
lrwx------ 1 root root 64 Aug  8 10:07 28 -> 'socket:[138550]'
lrwx------ 1 root root 64 Aug  8 10:08 35 -> 'socket:[143721]'
......
......
lrwx------ 1 root root 64 Aug  9 08:02 854 -> 'socket:[727400]'
lrwx------ 1 root root 64 Aug  9 08:07 856 -> 'socket:[717796]'
lrwx------ 1 root root 64 Aug  9 08:07 855 -> 'socket:[718033]'

lsof -p 11094
COMMAND     PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
scalingen 11094 root  cwd       DIR  202,2     4096  393942 /etc/sv/monit
scalingen 11094 root  rtd       DIR  202,2     4096       2 /
scalingen 11094 root  txt       REG 202,34 14771789 3801110 /var/vcap/data/packages/scalingengine/4cdf2fb61038ae7571f9b1726c9c40bfe586d004/scalingengine
scalingen 11094 root    0u      CHR    1,3      0t0       6 /dev/null
scalingen 11094 root    1w      REG 202,34  2385427 1310768 /var/vcap/data/sys/log/scalingengine/scalingengine.stdout.log
scalingen 11094 root    2w      REG 202,34   867291 1310769 /var/vcap/data/sys/log/scalingengine/scalingengine.stderr.log
scalingen 11094 root    3u     sock    0,9      0t0  141881 protocol: TCP
scalingen 11094 root    4u  a_inode   0,14        0   10943 [eventpoll]
scalingen 11094 root    5r     FIFO   0,13      0t0   62443 pipe
scalingen 11094 root    6w     FIFO   0,13      0t0   62443 pipe
scalingen 11094 root    7u     sock    0,9      0t0  123726 protocol: TCP
scalingen 11094 root    8u     sock    0,9      0t0  115102 protocol: TCP
scalingen 11094 root    9u     IPv4  58199      0t0     TCP *:6104 (LISTEN)
scalingen 11094 root   10u     IPv4  62465      0t0     TCP *:6204 (LISTEN)
scalingen 11094 root   11u     IPv4 878997      0t0     TCP e523295d-2ded-4bf7-b645-d1c0d731a194.asactors.default.app-autoscaler.microbosh:6104->e523295d-2ded-4bf7-b645-d1c0d731a194.asactors.default.app-autoscaler.microbosh:57828 (ESTABLISHED)
scalingen 11094 root   12u     sock    0,9      0t0  124697 protocol: TCP
scalingen 11094 root   13u     sock    0,9      0t0  130105 protocol: TCP
scalingen 11094 root   14u     sock    0,9      0t0  108201 protocol: TCP
scalingen 11094 root   15u     sock    0,9      0t0  118107 protocol: TCP
scalingen 11094 root   16u     sock    0,9      0t0  121501 protocol: TCP
scalingen 11094 root   17u     sock    0,9      0t0  118111 protocol: TCP
scalingen 11094 root   18u     sock    0,9      0t0  118067 protocol: TCP
scalingen 11094 root   19u     sock    0,9      0t0  127831 protocol: TCP
scalingen 11094 root   20u     sock    0,9      0t0  124597 protocol: TCP
scalingen 11094 root   21u     sock    0,9      0t0  147858 protocol: TCP
scalingen 11094 root   22u     sock    0,9      0t0  127343 protocol: TCP
scalingen 11094 root   23u     sock    0,9      0t0  133589 protocol: TCP
scalingen 11094 root   24u     sock    0,9      0t0  124579 protocol: TCP
scalingen 11094 root   25u     sock    0,9      0t0  138460 protocol: TCP
scalingen 11094 root   26u     sock    0,9      0t0  135202 protocol: TCP
....
....
...

But note that we have less of 30 file descriptors opened on envs where we have 5.3.2.....

lsof -p 10632
COMMAND     PID USER   FD      TYPE   DEVICE SIZE/OFF    NODE NAME
scalingen 10632 root  cwd       DIR    202,2     4096 1311426 /etc/sv/monit
scalingen 10632 root  rtd       DIR    202,2     4096       2 /
scalingen 10632 root  txt       REG   202,34 14659271 3278262 /var/vcap/data/packages/scalingengine/b859be46c0704525ff8f01349252175032c8e573/scalingengine
scalingen 10632 root    0u      CHR      1,3      0t0       6 /dev/null
scalingen 10632 root    1w      REG   202,34 27119708 5767216 /var/vcap/data/sys/log/scalingengine/scalingengine.stdout.log
scalingen 10632 root    2w      REG   202,34   186322 5767217 /var/vcap/data/sys/log/scalingengine/scalingengine.stderr.log
scalingen 10632 root    3u     IPv4 43924670      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:6104->a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:56726 (ESTABLISHED)
scalingen 10632 root    4u  a_inode     0,14        0   10943 [eventpoll]
scalingen 10632 root    5r     FIFO     0,13      0t0   81135 pipe
scalingen 10632 root    6w     FIFO     0,13      0t0   81135 pipe
scalingen 10632 root    7u     IPv4 43923616      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:6104->a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:56710 (ESTABLISHED)
scalingen 10632 root    9u     IPv4    81149      0t0     TCP *:6104 (LISTEN)
scalingen 10632 root   10u     IPv4    81150      0t0     TCP *:6204 (LISTEN)
scalingen 10632 root   15u     IPv4 43933746      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:6104->02dc7b1c-d654-4d0c-80b4-05bc37a9f731.asmetrics.default.app-autoscaler.microbosh:46118 (ESTABLISHED)
scalingen 10632 root   19u     IPv4 43933880      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57468->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   20u     IPv4 43932221      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:6104->4bfd42ac-a70d-4b6b-974a-5a56f90c2df9.asmetrics.default.app-autoscaler.microbosh:54954 (ESTABLISHED)
scalingen 10632 root   32u     IPv4 43931194      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57512->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   35u     IPv4 43933326      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57514->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   37u     IPv4 43932255      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57490->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   39u     IPv4 43933878      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57464->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   40u     IPv4 43933877      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57462->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   41u     IPv4 43932247      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57474->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)
scalingen 10632 root   48u     IPv4 43933882      0t0     TCP a48c4e69-876c-4e64-8281-b24586acbe68.asactors.default.app-autoscaler.microbosh:57472->9b0cf85a-192b-4d25-b2b8-8c4575a4cede.autoscalerdb.default.app-autoscaler-db.microbosh:postgresql (ESTABLISHED)

Not sure if https://github.com/cloudfoundry/app-autoscaler-release/pull/791/files is the cause...

KevinJCross commented 2 years ago

Hmmm ... interesting.

791 Is unlikely to cause this. It adds 2 calls instead of the one to the Cloud controler v3 to get the same information. It also adds retries to the CC if it gets a 500 code status.

735 has quite some changes around this.

I think we will need to investigate this a bit more. What state are the hanging tcp connections in ? Could you please try release 5.4.0 to debug if the changes in 5.4.0 caused the issue? In the mean time we migh try replicate this in our acceptance environment.

donacarr commented 2 years ago
netstat -pano
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name     Timer
tcp        0      0 169.254.0.2:53          0.0.0.0:*               LISTEN      4932/bosh-dns        off (0.00/0/0)
tcp        0      0 169.254.0.2:53          0.0.0.0:*               LISTEN      4932/bosh-dns        off (0.00/0/0)
tcp        0      0 169.254.0.2:53          0.0.0.0:*               LISTEN      4932/bosh-dns        off (0.00/0/0)
tcp        0      0 169.254.0.2:53          0.0.0.0:*               LISTEN      4932/bosh-dns        off (0.00/0/0)
tcp        0      0 0.0.0.0:8853            0.0.0.0:*               LISTEN      4869/bosh-dns-healt  off (0.00/0/0)
tcp        0      0 0.0.0.0:6102            0.0.0.0:*               LISTEN      11021/java           off (0.00/0/0)
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      11615/sshd           off (0.00/0/0)
tcp        3      0 0.0.0.0:6104            0.0.0.0:*               LISTEN      10973/scalingengine  off (0.00/0/0)
tcp        0      0 127.0.0.1:53080         0.0.0.0:*               LISTEN      4932/bosh-dns        off (0.00/0/0)
tcp        0      0 0.0.0.0:6202            0.0.0.0:*               LISTEN      11021/java           off (0.00/0/0)
tcp        0      0 0.0.0.0:24220           0.0.0.0:*               LISTEN      13398/ruby           off (0.00/0/0)
tcp        3      0 0.0.0.0:6204            0.0.0.0:*               LISTEN      10973/scalingengine  off (0.00/0/0)
tcp        0      0 0.0.0.0:6208            0.0.0.0:*               LISTEN      24281/operator       off (0.00/0/0)
tcp        0      0 127.0.0.1:8096          0.0.0.0:*               LISTEN      7761/mono            off (0.00/0/0)
tcp        0      0 127.0.0.1:3458          0.0.0.0:*               LISTEN      11306/forwarder-age  off (0.00/0/0)
tcp        0      0 127.0.0.1:3459          0.0.0.0:*               LISTEN      11240/loggregator-a  off (0.00/0/0)
tcp        0      0 127.0.0.1:2822          0.0.0.0:*               LISTEN      10959/monit-actual   off (0.00/0/0)
tcp        0      0 127.0.0.1:14823         0.0.0.0:*               LISTEN      11306/forwarder-age  off (0.00/0/0)
tcp        0      0 127.0.0.1:14824         0.0.0.0:*               LISTEN      11240/loggregator-a  off (0.00/0/0)
tcp        0      0 127.0.0.1:2825          0.0.0.0:*               LISTEN      1580/bosh-agent      off (0.00/0/0)
tcp        0      0 168.1.195.126:47946     168.1.44.28:8082        ESTABLISHED 11240/loggregator-a  keepalive (1.89/0/0)
tcp        0      0 169.254.0.2:39776       169.254.0.2:53          TIME_WAIT   -                    timewait (55.46/0/0)
tcp        0      0 169.254.0.2:39748       169.254.0.2:53          TIME_WAIT   -                    timewait (45.46/0/0)
tcp        0      0 168.1.195.126:6104      168.1.195.126:57144     ESTABLISHED 10973/scalingengine  keepalive (97.38/0/0)
tcp        0      0 168.1.195.126:22        168.1.40.110:32876      ESTABLISHED 49750/sshd: bosh_62  keepalive (87.45/0/0)
tcp      129      0 168.1.195.126:6204      168.1.31.101:36404      ESTABLISHED -                    off (0.00/0/0)
tcp        0      0 10.138.123.174:57796    10.138.245.229:5523     ESTABLISHED 13107/filebeat       off (0.00/0/0)
tcp        0      0 168.1.195.126:58666     168.1.44.103:8082       ESTABLISHED 11240/loggregator-a  keepalive (2.13/0/0)
tcp        0      0 168.1.195.126:6104      168.1.195.126:56258     ESTABLISHED 10973/scalingengine  keepalive (144.20/0/0)
tcp      253      0 168.1.195.126:6104      130.198.81.155:56778    ESTABLISHED -                    off (0.00/0/0)
tcp     1754      0 168.1.195.126:56412     168.1.195.126:6104      ESTABLISHED 11021/java           off (0.00/0/0)
tcp      254      0 168.1.195.126:6104      130.198.71.86:34150     CLOSE_WAIT  -                    off (0.00/0/0)
tcp        0      0 168.1.195.126:58969     52.61.29.159:443        ESTABLISHED 9801/falcon-sensor   off (0.00/0/0)
tcp        0      0 169.254.0.2:39730       169.254.0.2:53          TIME_WAIT   -                    timewait (20.45/0/0)
tcp        0      0 168.1.195.126:40054     168.1.44.23:5432        ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 169.254.0.2:39704       169.254.0.2:53          TIME_WAIT   -                    timewait (10.45/0/0)
tcp        0      0 169.254.0.2:39744       169.254.0.2:53          TIME_WAIT   -                    timewait (40.46/0/0)
tcp        0      0 168.1.195.126:6104      168.1.195.126:56590     ESTABLISHED 10973/scalingengine  keepalive (62.28/0/0)
tcp        0      0 168.1.195.126:40056     168.1.44.23:5432        ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 127.0.0.1:2822          127.0.0.1:39014         TIME_WAIT   -                    timewait (4.17/0/0)
tcp        0      0 168.1.195.126:47928     168.1.44.23:5432        ESTABLISHED 11021/java           off (0.00/0/0)
tcp     1754      0 168.1.195.126:56258     168.1.195.126:6104      ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 127.0.0.1:55542         127.0.0.1:3459          ESTABLISHED 11306/forwarder-age  keepalive (14.52/0/0)
tcp        1      0 10.138.123.174:39040    10.138.245.229:5522     CLOSE_WAIT  13091/filebeat       off (0.00/0/0)
tcp        0      0 10.138.123.174:34544    10.138.125.155:4222     ESTABLISHED 1580/bosh-agent      keepalive (13.47/0/0)
tcp      129      0 168.1.195.126:6204      130.198.81.155:60280    ESTABLISHED -                    off (0.00/0/0)
tcp        0      0 168.1.195.126:58672     168.1.44.103:8082       ESTABLISHED 11240/loggregator-a  keepalive (2.80/0/0)
tcp        0      0 127.0.0.1:3459          127.0.0.1:55542         ESTABLISHED 11240/loggregator-a  keepalive (14.52/0/0)
tcp        0      0 168.1.195.126:6104      168.1.195.126:56422     ESTABLISHED 10973/scalingengine  keepalive (144.20/0/0)
tcp     1754      0 168.1.195.126:57144     168.1.195.126:6104      ESTABLISHED 11021/java           off (0.00/0/0)
tcp     1754      0 168.1.195.126:56590     168.1.195.126:6104      ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 169.254.0.2:39702       169.254.0.2:53          TIME_WAIT   -                    timewait (5.45/0/0)
tcp        0      0 169.254.0.2:39736       169.254.0.2:53          TIME_WAIT   -                    timewait (30.46/0/0)
tcp        0      0 169.254.0.2:39742       169.254.0.2:53          TIME_WAIT   -                    timewait (35.46/0/0)
tcp        0      0 169.254.0.2:39728       169.254.0.2:53          TIME_WAIT   -                    timewait (15.45/0/0)
tcp        0      0 168.1.195.126:58670     168.1.44.103:8082       ESTABLISHED 11240/loggregator-a  keepalive (2.80/0/0)
tcp        0      0 168.1.195.126:54124     168.1.27.60:443         ESTABLISHED 13398/ruby           off (0.00/0/0)
tcp        0      0 168.1.195.126:54712     168.1.44.80:4224        ESTABLISHED 11181/route-registr  keepalive (0.32/0/0)
tcp        0      0 168.1.195.126:6102      168.1.31.101:37782      ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 168.1.195.126:54152     168.1.27.60:443         ESTABLISHED 13398/ruby           off (0.00/0/0)
tcp     1754      0 168.1.195.126:56422     168.1.195.126:6104      ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 169.254.0.2:39734       169.254.0.2:53          TIME_WAIT   -                    timewait (25.45/0/0)
tcp        0      0 168.1.195.126:47940     168.1.44.28:8082        ESTABLISHED 11240/loggregator-a  keepalive (1.88/0/0)
tcp        0      0 169.254.0.2:39750       169.254.0.2:53          TIME_WAIT   -                    timewait (50.46/0/0)
tcp        0      0 168.1.195.126:6102      130.198.81.155:48740    ESTABLISHED 11021/java           off (0.00/0/0)
tcp        0      0 169.254.0.2:39694       169.254.0.2:53          TIME_WAIT   -                    timewait (0.44/0/0)
tcp        0      0 168.1.195.126:8853      168.1.31.101:54980      ESTABLISHED 4869/bosh-dns-healt  keepalive (13.26/0/0)
tcp        0      0 10.138.123.174:33320    10.138.125.242:389      ESTABLISHED 12219/sssd_be        keepalive (8.42/0/0)
tcp      253      0 168.1.195.126:6104      130.198.81.155:56750    ESTABLISHED -                    off (0.00/0/0)
tcp        0      0 168.1.195.126:43804     168.1.44.23:5432        ESTABLISHED 11021/java           off (0.00/0/0)
tcp      129      0 168.1.195.126:6204      168.1.31.101:36282      ESTABLISHED -                    off (0.00/0/0)
tcp        0      0 168.1.195.126:48058     168.1.44.23:5432        ESTABLISHED 24281/operator       keepalive (10.90/0/0)
tcp        0      0 168.1.195.126:6202      168.1.31.101:39200      FIN_WAIT2   -                    timewait (27.24/0/0)
tcp        0      0 127.0.0.1:2822          127.0.0.1:39054         TIME_WAIT   -                    timewait (34.16/0/0)
tcp        0      0 168.1.195.126:6104      168.1.195.126:56412     ESTABLISHED 10973/scalingengine  keepalive (144.19/0/0)
tcp        0      0 168.1.195.126:54118     168.1.27.60:443         ESTABLISHED 13398/ruby           off (0.00/0/0)
udp        0      0 169.254.0.2:53          0.0.0.0:*                           4932/bosh-dns        off (0.00/0/0)
udp        0      0 169.254.0.2:53          0.0.0.0:*                           4932/bosh-dns        off (0.00/0/0)
udp        0      0 169.254.0.2:53          0.0.0.0:*                           4932/bosh-dns        off (0.00/0/0)
udp        0      0 169.254.0.2:53          0.0.0.0:*                           4932/bosh-dns        off (0.00/0/0)
udp        0      0 127.0.0.1:323           0.0.0.0:*                           50379/chronyd        off (0.00/0/0)
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
unix  2      [ ACC ]     SEQPACKET  LISTENING     21648    1/init               /run/udev/control
unix  2      [ ACC ]     STREAM     LISTENING     69597    13294/fluent.conf    /tmp/SERVERENGINE_SOCKETMANAGER_2022-08-08T09:41:51Z_13294
unix  2      [ ACC ]     STREAM     LISTENING     61106    12208/sssd           /var/lib/sss/pipes/private/sbus-monitor
unix  2      [ ACC ]     STREAM     LISTENING     24598    1/init               /run/uuidd/request
unix  2      [ ACC ]     STREAM     LISTENING     24600    1/init               /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     22783    425/multipathd       @/org/kernel/linux/storage/multipathd
unix  2      [ ]         DGRAM                    43499    1/init               /run/systemd/journal/syslog
unix  2      [ ACC ]     STREAM     LISTENING     65113    12219/sssd_be        /var/lib/sss/pipes/private/sbus-dp_sso-ldap-krb5.12219
unix  3      [ ]         DGRAM                    21640    1/init               /run/systemd/notify
unix  2      [ ACC ]     STREAM     LISTENING     24597    1/init               @ISCSIADM_ABSTRACT_NAMESPACE
unix  2      [ ACC ]     STREAM     LISTENING     42150    1/init               /var/run/secrets.socket
unix  2      [ ACC ]     STREAM     LISTENING     61119    12231/sssd_pam       /var/lib/sss/pipes/pam
unix  2      [ ACC ]     STREAM     LISTENING     61120    12231/sssd_pam       /var/lib/sss/pipes/private/pam
unix  2      [ ]         DGRAM                    939031   50379/chronyd        /run/chrony/chronyd.sock
unix  2      [ ACC ]     STREAM     LISTENING     63090    12230/sssd_nss       /var/lib/sss/pipes/nss
unix  2      [ ACC ]     STREAM     LISTENING     21643    1/init               /run/systemd/private
unix  2      [ ACC ]     STREAM     LISTENING     21657    1/init               /run/systemd/journal/stdout
unix  6      [ ]         DGRAM                    21659    1/init               /run/systemd/journal/socket
unix  16     [ ]         DGRAM                    21665    1/init               /run/systemd/journal/dev-log
unix  2      [ ACC ]     STREAM     LISTENING     21676    1/init               /run/systemd/fsck.progress
unix  3      [ ]         STREAM     CONNECTED     31778    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     24609    1/init
unix  3      [ ]         DGRAM                    29925    1628/systemd-networ
unix  2      [ ]         DGRAM                    30881    1580/bosh-agent
unix  2      [ ]         DGRAM                    44691    8101/audispd
unix  3      [ ]         STREAM     CONNECTED     26727    1/init               /run/systemd/journal/stdout
unix  2      [ ]         DGRAM                    29919    1628/systemd-networ
unix  3      [ ]         DGRAM                    29926    1628/systemd-networ
unix  3      [ ]         STREAM     CONNECTED     25648    538/dbus-daemon
unix  3      [ ]         STREAM     CONNECTED     61126    12208/sssd           /var/lib/sss/pipes/private/sbus-monitor
unix  2      [ ]         DGRAM                    20748    246/systemd-journal
unix  3      [ ]         STREAM     CONNECTED     929253   49750/sshd: bosh_62
unix  3      [ ]         STREAM     CONNECTED     61125    12231/sssd_pam
unix  3      [ ]         STREAM     CONNECTED     25649    538/dbus-daemon
unix  2      [ ]         STREAM     CONNECTED     58294    11021/java
unix  2      [ ]         DGRAM                    152208   9801/falcon-sensor
unix  3      [ ]         STREAM     CONNECTED     29929    1628/systemd-networ
unix  3      [ ]         STREAM     CONNECTED     63089    12230/sssd_nss
unix  3      [ ]         STREAM     CONNECTED     929252   49760/sshd: bosh_62
unix  3      [ ]         STREAM     CONNECTED     63088    12230/sssd_nss
unix  3      [ ]         STREAM     CONNECTED     65116    12219/sssd_be        /var/lib/sss/pipes/private/sbus-dp_sso-ldap-krb5.12219
unix  3      [ ]         STREAM     CONNECTED     23346    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     65121    12219/sssd_be        /var/lib/sss/pipes/private/sbus-dp_sso-ldap-krb5.12219
unix  3      [ ]         DGRAM                    29928    1628/systemd-networ
unix  2      [ ]         DGRAM                    44547    7761/mono
unix  2      [ ]         DGRAM                    30882    1580/bosh-agent
unix  3      [ ]         STREAM     CONNECTED     61123    12231/sssd_pam
unix  3      [ ]         DGRAM                    29927    1628/systemd-networ
unix  3      [ ]         STREAM     CONNECTED     25651    538/dbus-daemon      /var/run/dbus/system_bus_socket
unix  2      [ ]         STREAM     CONNECTED     930828   49760/sshd: bosh_62
unix  3      [ ]         STREAM     CONNECTED     24370    646/runsvdir
unix  2      [ ]         DGRAM                    25011    798/logger
unix  3      [ ]         STREAM     CONNECTED     28987    538/dbus-daemon      /var/run/dbus/system_bus_socket
unix  2      [ ]         DGRAM                    25647    538/dbus-daemon
unix  3      [ ]         STREAM     CONNECTED     29917    1628/systemd-networ
unix  3      [ ]         STREAM     CONNECTED     61124    12208/sssd           /var/lib/sss/pipes/private/sbus-monitor
unix  3      [ ]         STREAM     CONNECTED     25627    538/dbus-daemon
unix  2      [ ]         STREAM     CONNECTED     930618   49750/sshd: bosh_62
unix  3      [ ]         STREAM     CONNECTED     59646    10639/cron
unix  3      [ ]         DGRAM                    21642    1/init
unix  3      [ ]         STREAM     CONNECTED     26869    538/dbus-daemon      /var/run/dbus/system_bus_socket
unix  2      [ ]         DGRAM                    537953   1/init
unix  2      [ ]         DGRAM                    53221    9800/falcond
unix  3      [ ]         DGRAM                    19315    303/systemd-udevd
unix  2      [ ]         STREAM     CONNECTED     930628   49750/sshd: bosh_62
unix  2      [ ]         STREAM     CONNECTED     94900    14699/rsyslogd
unix  3      [ ]         STREAM     CONNECTED     65112    12219/sssd_be
unix  3      [ ]         STREAM     CONNECTED     62274    11615/sshd
unix  2      [ ]         DGRAM                    928491   49786/sudo
unix  2      [ ]         DGRAM                    94902    14699/rsyslogd
unix  3      [ ]         STREAM     CONNECTED     65106    12208/sssd
unix  3      [ ]         STREAM     CONNECTED     22493    425/multipathd
unix  3      [ ]         STREAM     CONNECTED     26868    536/python3
unix  2      [ ]         STREAM     CONNECTED     61058    11021/java
unix  3      [ ]         STREAM     CONNECTED     43808    8099/auditd
unix  3      [ ]         STREAM     CONNECTED     63728    1/init               /run/systemd/journal/stdout
unix  3      [ ]         DGRAM                    21641    1/init
unix  2      [ ]         DGRAM                    65268    12617/incrond
unix  3      [ ]         STREAM     CONNECTED     21464    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     63014    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     25652    538/dbus-daemon      /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     61114    12208/sssd           /var/lib/sss/pipes/private/sbus-monitor
unix  2      [ ]         DGRAM                    43809    8099/auditd
unix  2      [ ]         DGRAM                    23708    539/systemd-logind
unix  3      [ ]         STREAM     CONNECTED     43810    8099/auditd
unix  2      [ ]         DGRAM                    19303    303/systemd-udevd
unix  3      [ ]         STREAM     CONNECTED     26650    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     23714    539/systemd-logind
unix  3      [ ]         STREAM     CONNECTED     43807    8099/auditd
unix  3      [ ]         STREAM     CONNECTED     22494    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     26649    640/nessus-service
unix  3      [ ]         STREAM     CONNECTED     928488   49786/sudo
unix  3      [ ]         STREAM     CONNECTED     58634    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     23356    1/init               /run/systemd/journal/stdout
unix  3      [ ]         STREAM     CONNECTED     928489   49787/sudo
unix  3      [ ]         DGRAM                    19316    303/systemd-udevd
unix  3      [ ]         STREAM     CONNECTED     19300    303/systemd-udevd
unix  3      [ ]         STREAM     CONNECTED     23707    539/systemd-logind
unix  3      [ ]         STREAM     CONNECTED     43811    8099/auditd
unix  3      [ ]         STREAM     CONNECTED     20969    1/init               /run/systemd/journal/stdout
unix  2      [ ]         DGRAM                    939024   50379/chronyd
unix  3      [ ]         STREAM     CONNECTED     23638    536/python3
unix  2      [ ]         STREAM     CONNECTED     928483   49786/sudo
unix  2      [ ]         DGRAM                    930627   49760/sshd: bosh_62
KevinJCross commented 2 years ago

@donacarr Unfortunately that does not help much since there are only a few scaling engine connections in that snapshot. What we do need is to see where those connections from scaling engine are going. Ive got a suspicion it could be related to database connections. What are your connection settings for the db/s and cf client in the manifest ? If you are able to get a snapshot when the problem is happening with the netstat we can then conclude wether its the cf client or db connections.

donacarr commented 2 years ago

@KevinJCross I did more investigation and I found that a socket was about the API endpoint ... Look this live progress of the file descriptor 1769 of process 18962, it was captured once per second...

while sleep 1; do lsof -p 18962 | tail -13;done

scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (ESTABLISHED)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     IPv4 976736      0t0     TCP cbda4269-0aa5-4422-ae59-282e42945c2e.asactors.default.app-autoscaler.microbosh:54500->a23-67-156-218.deploy.static.akamaitechnologies.com:https (CLOSE_WAIT)
scalingen 18962 root 1769u     sock    0,9      0t0  976736 protocol: TCP
scalingen 18962 root 1769u     sock    0,9      0t0  976736 protocol: TCP
scalingen 18962 root 1769u     sock    0,9      0t0  976736 protocol: TCP
scalingen 18962 root 1769u     sock    0,9      0t0  976736 protocol: TCP
scalingen 18962 root 1769u     sock    0,9      0t0  976736 protocol: TCP
scalingen 18962 root 1769u     sock    0,9      0t0  976736 protocol: TCP

ping a23-67-156-218.deploy.static.akamaitechnologies.com
PING a23-67-156-218.deploy.static.akamaitechnologies.com (23.67.156.218) 56(84) bytes of data.
64 bytes from a23-67-156-218.deploy.static.akamaitechnologies.com (23.67.156.218): icmp_seq=1 ttl=55 time=2.09 ms

ping api.us-south.cf.xxx.yyy.com
PING e13712.a.akamaiedge.net (23.67.156.218) 56(84) bytes of data.
64 bytes from a23-67-156-218.deploy.static.akamaitechnologies.com (23.67.156.218): icmp_seq=1 ttl=55 time=1.91 ms
KevinJCross commented 2 years ago

ah right ... these are all in close wait state (connection has been closed by the server). This is a valid TCP state and is an indication that we are not reusing connections properly and is not really a problem of using too many connections been used. I believe this could have been an existing issue that has been highlighted by the fact that we

@OliverMautschke and I have a patchwe are working on yesterday that might aleviate this issue. I would also recommend we improve the keepalive settings for the http clients.

KevinJCross commented 2 years ago

Hi after some reviewing of the changed code we have found a bug that is causing this issue. This issue only happens if the cf api returns a >=300 response. I assume this would be for apps that have been deleted in the built in case.

https://github.com/cloudfoundry/app-autoscaler-release/pull/820 is the fix for this problem and we should be able to release this tomorrow.

KevinJCross commented 2 years ago

Ok ... did not know github auto closes the ticket if you merge the branch.

KevinJCross commented 2 years ago

@donacarr this has been fixed in https://github.com/cloudfoundry/app-autoscaler-release/releases/tag/6.1.0

silvestre commented 2 years ago

It seems like this issue is still present in 6.1.0 and probably 6.1.1 as well.

KevinJCross commented 2 years ago

@donacarr Ive found the issue and is fixed in PR 886.

KevinJCross commented 2 years ago

We believe this has now been fixed in release 7.0.0

KevinJCross commented 2 years ago

@donacarr Thanks for your help and patience on this issue