Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 532 forks source link

long running dynomite results in hundreds of inter dynomite connections? #672

Closed daflip closed 5 years ago

daflip commented 5 years ago

Hi

We are observing that long running dynomite instances build up hundreds of established inter-dynomite connections. We run 2 DC's with 3 racks in each. In all of our long running (say, a month or so) dynomite instances I find hundreds of established inter-dynomite connections. An example netstat output is below, this is just an excerpt, the total number of established connections for port 8101 was 1200:

tcp        0      0 192.168.100.225:8101    146.xx.xx.117:60672    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:53144    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:55042    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:48459    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:36905    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:60360    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:32827    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.118:58964    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:52758    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:33576    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:48215    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:57935    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:59839    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.118:43122    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.118:51028    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:41832    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.118:34217    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:54867    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:36641    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:36067    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:36347    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:49623    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:54929    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:47687    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:42029    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:45137    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:47821    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:57060    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:59305    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:40501    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:44874    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:36207    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.117:39724    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.116:36950    ESTABLISHED
tcp        0      0 192.168.100.225:8101    146.xx.xx.118:56308    ESTABLISHED

The same issue persists across all long running dynomite instances. Restarting the dynomite instance brings the established connection count way down to single figures. Is this a known bug? it seems to me that dynomite is not closing connections when it opens new ones.

here's an example config from one of the hosts;

dyn_o_mite:
  auto_eject_hosts: 'true'
  datacenter: us_dc
  dyn_listen: 0.0.0.0:8101
  dyn_port: 8101
  dyn_seed_provider: simple_provider
  dyn_seeds:
  - 192.168.100.226:8101:us3:us_dc:0
  - 192.168.100.224:8101:us1:us_dc:0
  - 146.xx.xx.117:8101:uk2:uk_dc:0
  - 146.xx.xx.118:8101:uk3:uk_dc:0
  - 146.xx.xx.116:8101:uk1:uk_dc:0
  listen: 0.0.0.0:8102
  pem_key_file: /etc/dynomite/key.pem
  rack: us2
  secure_server_option: datacenter
  server_failure_limit: 10
  server_retry_timeout: 10000
  servers:
  - 127.0.0.1:6379:1
  timeout: 150000
  tokens: 0
daflip commented 5 years ago

Closing issue - it appears to be a problem with the host machines not killing off uncleaning closed remote connections. Sorry for the noise.