Closed vvb closed 8 years ago
[admin@ucsb-blade2 compose]$ sudo systemctl status etcd
� etcd.service - Etcd
Loaded: loaded (/etc/systemd/system/etcd.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2016-04-18 09:56:41 EDT; 24min ago
Process: 688257 ExecStopPost=/usr/bin/etcd.sh post-stop (code=exited, status=0/SUCCESS)
Process: 688238 ExecStop=/usr/bin/etcd.sh stop (code=exited, status=0/SUCCESS)
Main PID: 135370 (code=exited, status=1/FAILURE)
Apr 18 09:56:13 ucsb-blade2 etcd.sh[135370]: 2016/04/18 09:56:13 etcdserver: 80% of the file descriptor limit is used [used = 825, limit = 1024]
Apr 18 09:56:18 ucsb-blade2 etcd.sh[135370]: 2016/04/18 09:56:18 etcdserver: 80% of the file descriptor limit is used [used = 883, limit = 1024]
Apr 18 09:56:23 ucsb-blade2 etcd.sh[135370]: 2016/04/18 09:56:23 etcdserver: 80% of the file descriptor limit is used [used = 928, limit = 1024]
Apr 18 09:56:28 ucsb-blade2 etcd.sh[135370]: 2016/04/18 09:56:28 etcdserver: 80% of the file descriptor limit is used [used = 982, limit = 1024]
Apr 18 09:56:33 ucsb-blade2 etcd.sh[135370]: 2016/04/18 09:56:33 etcdserver: cannot monitor file descriptor usage (open /proc/self/fd: too many open files)
Apr 18 09:56:40 ucsb-blade2 etcd.sh[135370]: 2016/04/18 09:56:40 etcdserver: failed to purge wal file open /var/lib/etcd/member/wal: too many open files
@vvb
interesting issue, just curious how does container scale ties into FD limit in etcd? I hope we are not leaking FDs.
We can perhaps do something as suggested here to setup ulimit for the systemd units.
@mapuri I am trying to figure myself.. but there seems to be something for sure.. look at the results below.. 690752
is pid of etcd.. compose.sh
launches 20 new containers. and the fd
count for netplugin and etcd jump by about 40
[admin@ucsb-blade2 compose]$ sudo lsof -p `pidof netplugin` | wc -l
32
[admin@ucsb-blade2 compose]$ sudo lsof -p 690752 | wc -l
28
[admin@ucsb-blade2 compose]$ ./compose.sh 10
Creating and starting 1 ... done
Creating and starting 2 ... done
Creating and starting 3 ... done
Creating and starting 4 ... done
Creating and starting 5 ... done
Creating and starting 6 ... done
Creating and starting 7 ... done
Creating and starting 8 ... done
Creating and starting 9 ... done
Creating and starting 10 ... done
Creating and starting 1 ... done
Creating and starting 2 ... done
Creating and starting 3 ... done
Creating and starting 4 ... done
Creating and starting 5 ... done
Creating and starting 6 ... done
Creating and starting 7 ... done
Creating and starting 8 ... done
Creating and starting 9 ... done
Creating and starting 10 ... done
[admin@ucsb-blade2 compose]$ sudo lsof -p 690752 | wc -l
70
[admin@ucsb-blade2 compose]$ sudo lsof -p `pidof netplugin` | wc -l
74
[admin@ucsb-blade2 compose]$
all these are new ones
etcd 690752 root 19u IPv6 16197181 0t0 TCP localhost:newoak->localhost:53102 (ESTABLISHED)
etcd 690752 root 20u IPv6 16707539 0t0 TCP localhost:newoak->localhost:53065 (ESTABLISHED)
etcd 690752 root 21u IPv6 16204524 0t0 TCP localhost:newoak->localhost:53068 (ESTABLISHED)
etcd 690752 root 22u IPv6 16303736 0t0 TCP localhost:newoak->localhost:53108 (ESTABLISHED)
etcd 690752 root 23u IPv6 16487921 0t0 TCP localhost:newoak->localhost:53091 (ESTABLISHED)
etcd 690752 root 24u IPv6 16717349 0t0 TCP localhost:newoak->localhost:53095 (ESTABLISHED)
etcd 690752 root 25u IPv6 16487934 0t0 TCP localhost:newoak->localhost:53096 (ESTABLISHED)
etcd 690752 root 26u IPv6 16487936 0t0 TCP localhost:newoak->localhost:53097 (ESTABLISHED)
etcd 690752 root 27u IPv6 16724060 0t0 TCP localhost:newoak->localhost:53118 (ESTABLISHED)
etcd 690752 root 28u IPv6 16717353 0t0 TCP localhost:newoak->localhost:53100 (ESTABLISHED)
etcd 690752 root 29u IPv6 16717355 0t0 TCP localhost:newoak->localhost:53101 (ESTABLISHED)
etcd 690752 root 30u IPv6 16197183 0t0 TCP localhost:newoak->localhost:53103 (ESTABLISHED)
etcd 690752 root 31u IPv6 16200260 0t0 TCP localhost:newoak->localhost:53111 (ESTABLISHED)
etcd 690752 root 32u IPv6 16717359 0t0 TCP localhost:newoak->localhost:53106 (ESTABLISHED)
etcd 690752 root 33u IPv6 16717361 0t0 TCP localhost:newoak->localhost:53107 (ESTABLISHED)
etcd 690752 root 34u IPv6 16551575 0t0 TCP ucs-blade2.cisco.com:2379->ucs-blade2.cisco.com:33866 (ESTABLISHED)
etcd 690752 root 35u IPv6 16681257 0t0 TCP localhost:newoak->localhost:53110 (ESTABLISHED)
etcd 690752 root 36u IPv6 16200262 0t0 TCP localhost:newoak->localhost:53112 (ESTABLISHED)
etcd 690752 root 37u IPv6 16204665 0t0 TCP localhost:newoak->localhost:53113 (ESTABLISHED)
etcd 690752 root 38u IPv6 16720687 0t0 TCP localhost:newoak->localhost:53114 (ESTABLISHED)
etcd 690752 root 39u IPv6 16720689 0t0 TCP localhost:newoak->localhost:53115 (ESTABLISHED)
etcd 690752 root 40u IPv6 16738318 0t0 TCP localhost:newoak->localhost:53130 (ESTABLISHED)
etcd 690752 root 41u IPv6 16720693 0t0 TCP localhost:newoak->localhost:53117 (ESTABLISHED)
etcd 690752 root 42u IPv6 16724062 0t0 TCP localhost:newoak->localhost:53119 (ESTABLISHED)
etcd 690752 root 43u IPv6 16738320 0t0 TCP localhost:newoak->localhost:53131 (ESTABLISHED)
etcd 690752 root 44u IPv6 16199305 0t0 TCP localhost:newoak->localhost:53122 (ESTABLISHED)
etcd 690752 root 45u IPv6 16197262 0t0 TCP localhost:newoak->localhost:53136 (ESTABLISHED)
etcd 690752 root 46u IPv6 16738322 0t0 TCP localhost:newoak->localhost:53132 (ESTABLISHED)
etcd 690752 root 47u IPv6 16740486 0t0 TCP localhost:newoak->localhost:53158 (ESTABLISHED)
etcd 690752 root 48u IPv6 16726381 0t0 TCP localhost:newoak->localhost:53148 (ESTABLISHED)
etcd 690752 root 49u IPv6 16738326 0t0 TCP localhost:newoak->localhost:53134 (ESTABLISHED)
etcd 690752 root 50u IPv6 16725053 0t0 TCP localhost:newoak->localhost:53135 (ESTABLISHED)
etcd 690752 root 51u IPv6 16197264 0t0 TCP localhost:newoak->localhost:53137 (ESTABLISHED)
etcd 690752 root 52u IPv6 16736533 0t0 TCP localhost:newoak->localhost:53143 (ESTABLISHED)
etcd 690752 root 53u IPv6 16197268 0t0 TCP localhost:newoak->localhost:53140 (ESTABLISHED)
etcd 690752 root 54u IPv6 16303802 0t0 TCP localhost:newoak->localhost:53142 (ESTABLISHED)
etcd 690752 root 55u IPv6 16735486 0t0 TCP localhost:newoak->localhost:53144 (ESTABLISHED)
etcd 690752 root 56u IPv6 16638713 0t0 TCP localhost:newoak->localhost:53154 (ESTABLISHED)
etcd 690752 root 57u IPv6 16736535 0t0 TCP localhost:newoak->localhost:53146 (ESTABLISHED)
etcd 690752 root 58u IPv6 16199350 0t0 TCP localhost:newoak->localhost:53147 (ESTABLISHED)
etcd 690752 root 59u IPv6 16740480 0t0 TCP localhost:newoak->localhost:53149 (ESTABLISHED)
etcd 690752 root 61u IPv6 16740484 0t0 TCP localhost:newoak->localhost:53151 (ESTABLISHED)
etcd 690752 root 62u IPv6 16638711 0t0 TCP localhost:newoak->localhost:53153 (ESTABLISHED)
etcd 690752 root 63u IPv6 16638715 0t0 TCP localhost:newoak->localhost:53155 (ESTABLISHED)
etcd 690752 root 64u IPv6 16638717 0t0 TCP localhost:newoak->localhost:53156 (ESTABLISHED)
etcd 690752 root 65u IPv6 16638719 0t0 TCP localhost:newoak->localhost:53157 (ESTABLISHED)
etcd 690752 root 67u IPv6 16740490 0t0 TCP localhost:newoak->localhost:53160 (ESTABLISHED)
i see, thanks for checking @vvb .
What are these new fds, new tcp sockets? May be we are leaking etcd client connections.
The leaks seem to be between netplugin and etcd..
[admin@ucsb-blade2 compose]$ sudo ls -l /proc/48483/fd/28
lrwx------ 1 root root 64 Apr 18 17:09 /proc/48483/fd/28 -> socket:[213809]
[admin@ucsb-blade2 compose]$
[admin@ucsb-blade2 compose]$ sudo lsof | grep 213809
etcd 48483 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
etcd 48483 48485 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
etcd 48483 48486 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
etcd 48483 48487 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
etcd 48483 48488 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
etcd 48483 48566 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
etcd 48483 48567 root 28u IPv6 213809 0t0 TCP localhost:newoak->localhost:36302 (ESTABLISHED)
[admin@ucsb-blade2 compose]$
[admin@ucsb-blade2 compose]$ sudo netstat -tnp | grep 36302
tcp 0 0 127.0.0.1:36302 127.0.0.1:4001 ESTABLISHED 51710/netplugin
tcp6 0 0 127.0.0.1:4001 127.0.0.1:36302 ESTABLISHED 48483/etcd
[admin@ucsb-blade2 compose]$
potential fix in https://github.com/contiv/netplugin/pull/325 need to be verified - MasterPostReq creates a newclient every time..and doesn’t seem to close it
awesome, thanks for digging in @vvb .
The same happens in my test environment. The opening socket between neplugin and etcd is keep increasing when doing container creation and deletion. The session number increased by 2 after create container, increase by 1 after delete container. The number stay the same even after one day.
[root@baymax-2 ~]# lsof -p 27857 |wc -l 148 [root@baymax-2 ~]# docker run -d --name centos-c8 --net vlan3096net centos /bin/bash -c 'while true;do echo test;sleep 5;done' 069bb202deb2de6397cc585ec6398a0bf9f6ea764e97ab0cdc0c03086b33ab23 [root@baymax-2 ~]# lsof -p 27857 |wc -l 150 [root@baymax-2 ~]# docker rm -f centos-c8 centos-c8 [root@baymax-2 ~]# lsof -p 27857 |wc -l 151 [root@baymax-2 ~]# ps 27857 PID TTY STAT TIME COMMAND 27857 ? Sl 10:39 /opt/netplugin-v0.1-04-09-2016.02-54-52.UTC/netplugin -plugin-mode docker -vlan-if bond1
closing this assuming that https://github.com/contiv/netplugin/pull/325 has addressed this.
Please reopen in case that is not the case.
current limits are too low and don't even allow to successfully launch 2k containers before netplugin,volplugin run out of fds