Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 534 forks source link

memory usage higher than expected #651

Open ivancoppa opened 5 years ago

ivancoppa commented 5 years ago

Hi, I'm testing dynomite (v0.7.0) with the goal of having all the data written across 3 datacenters without any sharding.

What I'm experiencing is a massive amount of memory usage and I'm trying to understand what I'm missing.

I did not change the default values for mbuf_size and max_msgs so what I was expecting was something like 3.05GB of memory usage top consindering the dafault values.

mbuf_size: size of mbuf chunk in bytes (default: 16384 bytes).
max_msgs: max number of messages to allocate (default: 200000).

What I'm currently seeing is:

"alloc_msgs":8542,
"free_msgs":8539,
"alloc_mbufs":11450,
"free_mbufs":11447,
"dyn_memory":7880712

dyn_memory: 7.5G

top:

20416 dynomite  20   0 8043040 7.517g   3004 S  25.0 32.7   3536:02 dynomite
20954 redis     20   0 8466984 6.116g   2872 S  15.0 26.6   2266:40 redis-server

I'm currently testing this configuration of dynomite.

  datacenter: dc1
  dyn_listen: 0.0.0.0:8101
  dyn_port: 8101
  dyn_seed_provider: simple_provider
  preconnect: true
  dyn_seeds:
  - 10.xxx.1.1:8101:rack1:dc1:4294967294
  - 10.xxx.1.3:8101:rack3:dc1:4294967294
  - 10.xxx.2.1:8101:rack1:dc2:4294967294
  - 10.xxx.2.2:8101:rack2:dc2:4294967294
  - 10.xxx.2.3:8101:rack3:dc2:4294967294
  - 10.xxx.3.1:8101:rack1:dc3:4294967294
  listen: 0.0.0.0:8102
  rack: rack2
  servers:
    - 127.0.0.1:6379:1
  read_consistency: DC_SAFE_QUORUM
  write_consistency: DC_SAFE_QUORUM
  timeout: 300000
  tokens: 4294967294
  server_failure_limit: 3
  server_retry_timeout: 30000
  auto_eject_hosts: true

Image of the memory usage over time

dynomite_mem_usage
lhucinequr commented 5 years ago

dynomite_mem We are experiencing the same issue, we have configured dynomite to use the following config: MAX_MSGS: 100000 MBUF_SIZE: 8192 According to our calculations, dynomite should not consume more than ~800 MB of RAM, but in reality it does consume a lot more than that(1.4Go) and counting... up to the point where it is killed by the OOM killer. Any help on this issue will be appreciated

smukil commented 5 years ago

@ivancoppa @lhucinequr I will investigate this and get back.

lhucinequr commented 5 years ago

A quick update concerning this issue: We have done few more tests: We are running dynomite( v0.7.0) in simple 3 node configuration cluster with the following configuration in each node:

dyn_o_mite:
  datacenter: bgl
  rack: rack1
  dyn_listen: 0.0.0.0:8101

  dyn_seed_provider: simple_provider
  dyn_seeds:
    - dynomite-bench-node-2:8101:rack2:bgl:bglr2n1
    - dynomite-bench-node-3:8101:rack3:bgl:bglr3n1

  listen: 0.0.0.0:8102
  servers:
    - 127.0.0.1:6379:1
  tokens: 'bglr1n1'

  pem_key_file: /usr/local/etc/dynomite/dynomite.pem
  data_store: 0

  stats_listen: 0.0.0.0:22222
  mbuf_size: 4096
  max_msgs: 100000
  read_consistency : DC_ONE
  write_consistency : DC_SAFE_QUORUM

According to the documentation, dynomite should not use more than 4096*100000 bytes = 390.625 MB of memory, but after running the cluster for 2 days, each node now uses roughly 655300 KB = 639.94140625 MB (according to http://localhost:22222/info )

curl -s http://localhost:22222/info | jq .dyn_memory
655300

I'm pretty sure this trend will continue untill all memory is consumed and the dynomite process is killed by the OOM.(we use docker and we set a hard memory limit on each node). Are we doing something wrong? is this the expected behavior?

ivancoppa commented 5 years ago

@smukil Any updates? Did you had time to investigate the issue?

smukil commented 5 years ago

@ivancoppa I was unable to track down any leaks. I'll get back to it in a bit and try to have an update soon.

kjlaw89 commented 5 years ago

Hey all - we just started using dynomite and ran into this issue as well. I was able to track it down and put in a PR #710 to get it resolved.

ivancoppa commented 5 years ago

Thanks @kjlaw89, I will try your patch

vengomatto commented 5 years ago

We tested dynomite with version 0.6.15 and with branch rel_0.6_prod and we experienced this issue as well. with mbuf_size: 16k max_msgs: 100000 dynomite went over 2GB of memory usage. More than that, mem consumption slightly increased even at rest. image

Now we are starting a new test with the patch suggested by @kjlaw89