Memory leaks - Githubissues

khachaturh commented 4 years ago

The Client::batch_get function seems to make a lot memory leaks. After thousands call the process memory grow up to 10 GB. The same requests with simply Client::get looks good. The target is x86_64-unknown-linux-gnu

jhecking commented 4 years ago

I've been trying to replicate this, but was unsuccessful so far. So far, I've tried running the client's batch_get test in a loop, and observing memory allocations after every X iterations:

$ cargo test --release --test lib batch -- --nocapture -Zunstable-options --report-time
   Compiling aerospike v0.5.0 (/Users/jhecking/aerospike/aerospike-client-rust)
    Finished release [optimized] target(s) in 4.72s
     Running target/release/deps/lib-234f38fdc32b270d

running 1 test
       0 iterations:  8215696 bytes allocated / 13152256 bytes resident
   10000 iterations:  9496168 bytes allocated / 19107840 bytes resident
   20000 iterations:  9601664 bytes allocated / 18993152 bytes resident
   30000 iterations: 11312288 bytes allocated / 19124224 bytes resident
   40000 iterations: 11322856 bytes allocated / 18956288 bytes resident
   50000 iterations:  9536992 bytes allocated / 19046400 bytes resident
   60000 iterations:  9628136 bytes allocated / 19099648 bytes resident
test src::batch::batch_get ... test src::batch::batch_get has been running for over 60 seconds
   70000 iterations: 11179760 bytes allocated / 19070976 bytes resident
   80000 iterations: 11339816 bytes allocated / 19103744 bytes resident
   90000 iterations: 11374760 bytes allocated / 19075072 bytes resident
  100000 iterations:  9530040 bytes allocated / 18935808 bytes resident
  110000 iterations:  9599800 bytes allocated / 19161088 bytes resident
  120000 iterations: 10807808 bytes allocated / 19156992 bytes resident
  130000 iterations: 10472288 bytes allocated / 18882560 bytes resident
  140000 iterations:  8614328 bytes allocated / 18944000 bytes resident
  150000 iterations:  8367832 bytes allocated / 18849792 bytes resident
  160000 iterations:  9397392 bytes allocated / 18989056 bytes resident
  170000 iterations: 10029000 bytes allocated / 18964480 bytes resident
  180000 iterations: 10152560 bytes allocated / 18952192 bytes resident
  190000 iterations:  8348952 bytes allocated / 19030016 bytes resident
  200000 iterations:  8355600 bytes allocated / 19034112 bytes resident
test src::batch::batch_get ... ok <195.465s>

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 17 filtered out

Memory allocations go up and down cyclically but don't seem to grow out of bounds. I'm using the jemallocator crate to measure allocations. For now I've only tested this in macOS, not Linux.

Can you tell me a bit more about how you are using the client's batch_get function? Maybe some sample code?

khachaturh commented 4 years ago

I see batch size is 4 on your test. https://github.com/aerospike/aerospike-client-rust/blob/master/tests/src/batch.rs

Leaks are happened when batch size is a big. For example, if size is less than 100, when leaks aren't visible. But let's try to get_batch with 1000 batch elements and memory will grow aggressive. Memory isn't freed even after client.close().

Another minor issue about close. Client isn't call close when is going out from scope. So the thread and connection pool stay opened. May be needs to implement Drop trait.

jhecking commented 4 years ago

Ok, I will try to reproduce the issue with larger batch sizes. How many nodes are in your cluster?

jhecking commented 4 years ago

Another minor issue about close. Client isn't call close when is going out from scope. So the thread and connection pool stay opened. May be needs to implement Drop trait.

That's definitely a good suggestion. Feel free to file a separate issue for that.

khachaturh commented 4 years ago

Ok, I will try to reproduce the issue with larger batch sizes. How many nodes are in your cluster?

5 nodes with replication factor 2

soro commented 4 years ago

So, my previous post was correct but misleading and so I deleted it. The truncate method on vec does not actually release memory and so while it looks like it should shrink according to the code, the actual allocation does not. Since the buffer used to write the request etc is attached to a connection and thus its lifetime is the same and currently only ever grows to fit the maximum size, the buffer will eventually get huge if you push large requests through a connection. If this happens for every connection in a pool, after some time the allocation will balloon dramatically. The resizing of the buffer actually needs to drop the memory, which means calling any of the shrink_to methods on the vector.

I submitted a pull request that does this. The threshold value is something you might want to make configurable or change.

jhecking commented 4 years ago

Thank you @soro! Will take a look at your PR and revert by the end of the week.

jhecking commented 4 years ago

Resolved in #83.

aerospike / aerospike-client-rust

Memory leaks #82