NREL / api-umbrella

Open source API management platform
http://apiumbrella.io
MIT License
2k stars 326 forks source link

Zombie processes in api-umbrella docker container #534

Open mbettex opened 3 years ago

mbettex commented 3 years ago

Hello,

My company is using api-umbrella on docker (nrel/api-umbrella:0.15.1) on top of ubuntu (16.04.5) in multiple environments.

I noticed a strange behavior in those environments. Approximately every 2 minutes, something in the docker container of api-umbrella is creating a zombie processes.

Below is the api-umbrella docker container

docker ps | grep api-umbrella
70197abf50c4        nrel/api-umbrella:0.15.1   "api-umbrella run"       2 months ago        Up 25 hours         0.0.0.0:8082->80/tcp, 0.0.0.0:4432->443/tcp                           frontend-apiumbrella

Below is an extract of the processes that are currently running on one of these virtual machines.

ps -ajfx
PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
   0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
# Many unrelated lines omitted for brevity
   1  1445  1445  1445 ?           -1 Ssl      0   2:47 /usr/bin/dockerd -H fd://
1445  1860  1860  1860 ?           -1 Ssl      0   0:53  \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/ru
1860  2431  2431  1860 ?           -1 Sl       0   0:00  |   \_ docker-containerd-shim 70197abf50c4b5ef189bd91b55306dd2a19ead9e4f969b69c2969b3687abadb3 /var/run/docker/libcontainerd/70197abf50c4b5ef18
2431  2523  2523  2523 ?           -1 Ss       0   0:00  |   |   \_ perl /opt/api-umbrella/embedded/bin/resty /opt/api-umbrella/embedded/apps/core/current/bin/api-umbrella-cli run
2523  2752  2523  2523 ?           -1 S        0   0:00  |   |       \_ api-umbrella /opt/api-umbrella/etc/perp
2752  2803  2523  2523 ?           -1 S      999   0:03  |   |       |   \_ svlogd -ttt /opt/api-umbrella/var/log/perpd
2752  2804  2523  2523 ?           -1 S        0   0:37  |   |       |   \_ perpd /opt/api-umbrella/etc/perp
2804  2890  2890  2890 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/trafficserver
2804  2891  2891  2891 ?           -1 Ssl    999   0:36  |   |       |       \_ traffic_manager --nosyslog
2891  3172  2891  2891 ?           -1 Sl     999  39:21  |   |       |       |   \_ /opt/api-umbrella/embedded/bin/traffic_server -M --httpport 14009:fd=7
2804  2892  2892  2892 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/nginx
2804  2893  2893  2893 ?           -1 Ss       0   0:00  |   |       |       \_ nginx: master process nginx -p /opt/api-umbrella/embedded/apps/core/current/ -c /opt/api-umbrella/etc/nginx/router.conf
2893  3160  2893  2893 ?           -1 S      999   3:17  |   |       |       |   \_ nginx: worker process
2893  3161  2893  2893 ?           -1 S      999   2:52  |   |       |       |   \_ nginx: worker process
2804  2894  2894  2894 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/web-delayed-job
2804  2895  2895  2895 ?           -1 Ssl    999   0:56  |   |       |       \_ ./bin/delayed_job --pid-dir=/opt/api-umbrella/var/run run
2804  2896  2896  2896 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/web-puma
2804  2897  2897  2897 ?           -1 Ssl    999   0:03  |   |       |       \_ puma 3.12.1 (unix:///opt/api-umbrella/var/run/web-puma.sock) [web-app]
2897  3373  2897  2897 ?           -1 Sl     999   0:14  |   |       |       |   \_ puma: cluster worker 0: 89 [web-app]
2897  3375  2897  2897 ?           -1 Sl     999   0:14  |   |       |       |   \_ puma: cluster worker 1: 89 [web-app]
2804  2898  2898  2898 ?           -1 Ss     999   0:02  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/geoip-auto-updater
2804  2900  2900  2900 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/mongod
2804  2901  2901  2901 ?           -1 Ssl    999   8:41  |   |       |       \_ mongod --config /opt/api-umbrella/etc/mongod.conf
2804  2902  2902  2902 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/elasticsearch
2804  2903  2903  2903 ?           -1 Ssl    999   6:08  |   |       |       \_ /usr/bin/java -Xms512m -Xmx512m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccup
2804  2904  2904  2904 ?           -1 Ss     999   0:10  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/rsyslog
2804  2906  2906  2906 ?           -1 Ss     999   0:00  |   |       |       \_ svlogd -ttt /opt/api-umbrella/var/log/mora
2804  2907  2907  2907 ?           -1 Ssl    999   9:18  |   |       |       \_ mora -config /opt/api-umbrella/etc/mora.properties
2804 18307 18307 18307 ?           -1 Ss       0   0:00  |   |       |       \_ perpd /opt/api-umbrella/etc/perp
2804 18312 18312 18312 ?           -1 Ss       0   0:00  |   |       |       \_ bash /opt/api-umbrella/embedded/apps/core/current/bin/api-umbrella-geoip-auto-updater
8312 18315 18312 18312 ?           -1 S        0   0:00  |   |       |           \_ curl --silent --show-error --fail --location --retry 3 --output /tmp/api-umbrella-geoip-auto-updater.mDomZek8N3.gz h
2523  3523  2899  2899 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  3719  3594  3594 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  5258  5202  5202 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  6506  6460  6460 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  6836  6793  6793 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  7069  7026  7026 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  8998  8954  8954 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  9696  9653  9653 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  9939  9892  9892 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 10854 10811 10811 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 11042 10999 10999 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 11649 11606 11606 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 11838 11795 11795 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 13014 12970 12970 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 13291 13245 13245 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 13621 13578 13578 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 14561 14510 14510 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 14606 14563 14563 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 14886 14838 14838 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 15169 15123 15123 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 15259 15216 15216 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 15680 15634 15634 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 15964 15919 15919 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 16717 16672 16672 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 17002 16955 16955 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 17048 17005 17005 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 17650 17607 17607 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 18263 18219 18219 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 18452 18409 18409 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 19018 18975 18975 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 19486 19443 19443 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 19914 19861 19861 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 20663 20617 20617 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 20993 20950 20950 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 21645 21602 21602 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 22442 22399 22399 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 22676 22633 22633 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 22865 22817 22817 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 23521 23475 23475 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 24414 24370 24370 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 24791 24748 24748 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 24936 24885 24885 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 26575 26532 26532 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 27138 27095 27095 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 27463 27418 27418 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 27556 27513 27513 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 27648 27605 27605 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 27981 27937 27937 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 28026 27983 27983 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 28638 28595 28595 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 29063 29018 29018 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 29110 29065 29065 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 29996 29953 29953 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 31687 31636 31636 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523   884   838   838 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  1764  1718  1718 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  2857  2814  2814 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  2919  2859  2859 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  4127  4082  4082 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  4269  4226  4226 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  5065  5021  5021 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  5303  5257  5257 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  5865  5822  5822 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  5919  5867  5867 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  6055  6012  6012 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  7830  7784  7784 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  7976  7925  7925 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  8672  8629  8629 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  8717  8674  8674 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  8771  8719  8719 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  9285  9242  9242 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  9515  9472  9472 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  9571  9517  9517 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523  9997  9954  9954 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 10134 10089 10089 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 10886 10842 10842 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 13053 13009 13009 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 14045 13999 13999 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 14608 14564 14564 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
2523 15078 15035 15035 ?           -1 Z        0   0:00  |   |       \_ [bash] <defunct>
# Many more lines with defunct processes as above
# Many unrelated lines omitted for brevity

I checked in 3 enviroments that I have this setup and the perl process in the api-umbrella container has 987, 1328 and 15465 child processes and counting.

Eventually the linux vm runs out of processes and has to be rebooted.

Any help or clue about this issue would be much appreciated.

mbettex commented 3 years ago

I investigated the issue and found out that this issue is not happening on the few environments that I have where we don't use docker, but where api-umbrella is directly installed on the vm.

In case somebody also has the issue and is interrested in a solution, for now, I apply a workaround which is to

I am using the following commands to monitor the number of defunct child process

apiUmbrellaPid=$(ps -aux | grep perl | grep api-umbrella | awk '{print $2}')
defunctChildCount=$(pgrep -P $apiUmbrellaPid | xargs ps -p | awk '{print $3}' | grep 'Z' | wc -l)
echo $defunctChildCount

I am using the following command to restart the api-umbrella

docker exec my-apiumbrella-docker-container api-umbrella restart