erickt / rust-zmq

Rust zeromq bindings.
Apache License 2.0
886 stars 189 forks source link

Memory usage #304

Open fecetrulo opened 4 years ago

fecetrulo commented 4 years ago

I started to notice some weird behavior on a server/client application. The client side still has a high memory usage even after processing the messages, the memory usage varies from 247MB~250MB after processing roughly 2GB of messages.

The server will listen to a socket bind (ROUTER) and replying with 2GB of messages. The client will connect to this bind using a REQ socket, send a request and wait for the response, after receiving the response it will "process" it, counting the number of right messages.

I'm not sure if I'm doing something wrong or if this is an issue in Rust linking with other libraries. Anyway I hope you can give me some advice on how to solve this or how to provide more useful information.

Follows a MWE

zmq version: 4.3.2 rust-zmq version: 0.9.2

update: i used valgring and massif to analize the memory. reports

update2: i managed to reproduce this behaviour without using send/recv or a socket all. Follows the example and the reports

psychon commented 4 years ago

I did a debug build of your program (after update2) and asked valgrind --tool=massif for a report. The last snapshot says:

100.00% (1,988,179,757B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.94% (1,987,007,944B) 0x1153EB: alloc::alloc::alloc (alloc.rs:84)
| ->99.94% (1,987,007,944B) 0x1151D1: <alloc::alloc::Global as core::alloc::Alloc>::alloc (alloc.rs:172)
|   ->99.94% (1,987,007,944B) 0x1130A6: alloc::raw_vec::RawVec<T,A>::allocate_in (raw_vec.rs:98)
|     ->99.94% (1,987,007,944B) 0x112DB5: alloc::raw_vec::RawVec<T>::with_capacity (raw_vec.rs:167)
|       ->99.94% (1,987,007,944B) 0x117963: alloc::vec::Vec<T>::with_capacity (vec.rs:358)
|         ->99.94% (1,987,007,944B) 0x116450: alloc::slice::hack::to_vec (slice.rs:158)
|           ->99.94% (1,987,007,944B) 0x1192B4: alloc::slice::<impl [T]>::to_vec (slice.rs:379)
|             ->99.89% (1,986,000,000B) 0x10EB2D: foo::main::{{closure}} (main.rs:15)
[...]

Line 15 is result.push((meta.to_vec(), info.to_vec())).

I am not actually sure what your example is supposed to do. You create qtd = 2000 copies of a message containing BYTES = 1_000_000 bytes. 2k * 1M = 2G, so that matches your memory usage of about 2 GiB.

Simple example of the same behaviour without zmq:

static BYTES: usize = 1_000_000;
let qtd = 2000;
let info: Vec<u8> = vec![0; BYTES];
let mut result = Vec::new();
(0..qtd).for_each(|_| {
    result.push(info.clone());
});
fecetrulo commented 4 years ago

@psychon thanks for the response.

Line 15 is result.push((meta.to_vec(), info.to_vec())).

I am not actually sure what your example is supposed to do. You create qtd = 2000 copies of a message containing BYTES = 1_000_000 bytes. 2k * 1M = 2G, so that matches your memory usage of about 2 GiB.

Sorry, i should specified better what was the behavior. The problem is after the result is consumed/dropped which should, in theory, free all memory.

On your example the memory usage after the drop (i had to drop the vec manually) the memory usage is almost zero (3MB). On my example the memory usage after is consumed is roughly 249MB.

psychon commented 4 years ago

Still, cannot reproduce. I took your code, removed loop {}, and renamed main do do_it and added a new fn main() { do_it(); do_it(); do_it(); }. Massif now produces the following graph:

    GB
1.852^                        #                                               
     |                       @#                      :                       :
     |                      @@#                    :::                      ::
     |                     @@@#                    :::                     :::
     |                   @@@@@#                   ::::                    ::::
     |                  @@@@@@#                 ::::::                  @:::::
     |                 @@@@@@@#                @::::::                 :@:::::
     |               @@@@@@@@@#              ::@::::::                ::@:::::
     |              @@ @@@@@@@#              ::@::::::               :::@:::::
     |             @@@ @@@@@@@#            @:::@::::::             :::::@:::::
     |            @@@@ @@@@@@@#           :@:::@::::::             :::::@:::::
     |           @@@@@ @@@@@@@#          ::@:::@::::::           :::::::@:::::
     |         @@@@@@@ @@@@@@@#        ::::@:::@::::::          @:::::::@:::::
     |        @@@@@@@@ @@@@@@@#        ::::@:::@::::::         :@:::::::@:::::
     |       @@@@@@@@@ @@@@@@@#      ::::::@:::@::::::       :::@:::::::@:::::
     |      @@@@@@@@@@ @@@@@@@#     :::::::@:::@::::::      ::::@:::::::@:::::
     |     @@@@@@@@@@@ @@@@@@@#    @:::::::@:::@::::::     :::::@:::::::@:::::
     |   @@@@@@@@@@@@@ @@@@@@@#   :@:::::::@:::@::::::    ::::::@:::::::@:::::
     |  @@@@@@@@@@@@@@ @@@@@@@#  ::@:::::::@:::@::::::   @::::::@:::::::@:::::
     |  @@@@@@@@@@@@@@ @@@@@@@# @::@:::::::@:::@::::::  :@::::::@:::::::@:::::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   11.11

The memory usage definitely shows the three repetitions and the peak memory usage even occurs during the first "run".

Rust code fn do_it() { use zmq; static BYTES: usize = 1_000_000; static META: &str = "meta"; let qtd = 2000; let info: Vec = vec![0; BYTES]; let meta = META; let mut result = Vec::new(); (0..qtd).for_each(|_| { let meta = zmq::Message::from(meta); let info = zmq::Message::from(&info); result.push((meta.to_vec(), info.to_vec())) }); let mut count = 0; result.into_iter().for_each(|(meta, info)| { if std::str::from_utf8(&meta).unwrap() == META && info.len() == BYTES { count += 1; } }); println!("count {}", count); println!("Hello, world from zmw!"); } fn main() { do_it(); do_it(); do_it(); }

Edit:

On your example the memory usage after the drop (i had to drop the vec manually) the memory usage is almost zero (3MB). On my example the memory usage after is consumed is roughly 249MB.

But your example does not explicitly drop its Vecs either...?

fecetrulo commented 4 years ago

@psychon thanks again for your response.

Still, cannot reproduce. I took your code, removed loop {}, and renamed main do do_it and added a new fn main() { do_it(); do_it(); do_it(); }. Massif now produces the following graph:

    GB
1.852^                        #                                               
     |                       @#                      :                       :
     |                      @@#                    :::                      ::
     |                     @@@#                    :::                     :::
     |                   @@@@@#                   ::::                    ::::
     |                  @@@@@@#                 ::::::                  @:::::
     |                 @@@@@@@#                @::::::                 :@:::::
     |               @@@@@@@@@#              ::@::::::                ::@:::::
     |              @@ @@@@@@@#              ::@::::::               :::@:::::
     |             @@@ @@@@@@@#            @:::@::::::             :::::@:::::
     |            @@@@ @@@@@@@#           :@:::@::::::             :::::@:::::
     |           @@@@@ @@@@@@@#          ::@:::@::::::           :::::::@:::::
     |         @@@@@@@ @@@@@@@#        ::::@:::@::::::          @:::::::@:::::
     |        @@@@@@@@ @@@@@@@#        ::::@:::@::::::         :@:::::::@:::::
     |       @@@@@@@@@ @@@@@@@#      ::::::@:::@::::::       :::@:::::::@:::::
     |      @@@@@@@@@@ @@@@@@@#     :::::::@:::@::::::      ::::@:::::::@:::::
     |     @@@@@@@@@@@ @@@@@@@#    @:::::::@:::@::::::     :::::@:::::::@:::::
     |   @@@@@@@@@@@@@ @@@@@@@#   :@:::::::@:::@::::::    ::::::@:::::::@:::::
     |  @@@@@@@@@@@@@@ @@@@@@@#  ::@:::::::@:::@::::::   @::::::@:::::::@:::::
     |  @@@@@@@@@@@@@@ @@@@@@@# @::@:::::::@:::@::::::  :@::::::@:::::::@:::::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   11.11

The memory usage definitely shows the three repetitions and the peak memory usage even occurs during the first "run".

Ans this is correct, i got the same result as you when I run your example with valgrind's tool. The problem occurs after the do_it call. I added a loop only to show that there is still memory usage even after the Message vector is dropped, but it works with a sleep too. I couldn't detect this usage with valgrind or heap track, but its shown on top or htop:

top htop

Rust code

fn do_it() {
use zmq;
static BYTES: usize = 1_000_000;
static META: &str = "meta";

    let qtd = 2000;

    let info: Vec<u8> = vec![0; BYTES];
    let meta = META;
    let mut result = Vec::new();

    (0..qtd).for_each(|_| {
        let meta = zmq::Message::from(meta);
        let info = zmq::Message::from(&info);
        result.push((meta.to_vec(), info.to_vec()))
    });

    let mut count = 0;
    result.into_iter().for_each(|(meta, info)| {
        if std::str::from_utf8(&meta).unwrap() == META
            && info.len() == BYTES
        {
            count += 1;
        }
    });

    println!("count {}", count);

    println!("Hello, world from zmw!");
}
fn main() {
    do_it();
    do_it();
    do_it();
}

Edit:

On your example the memory usage after the drop (i had to drop the vec manually) the memory usage is almost zero (3MB). On my example the memory usage after is consumed is roughly 249MB.

But your example does not explicitly drop its Vecs either...?

On my example the vector is consumed by into_iter which is consumed by for_each thus a explicit drop call isn't needed (and not possible). On the other hand on your first example the result was never consumed or dropped.

psychon commented 4 years ago

I couldn't detect this usage with valgrind or heap track, but its shown on top or htop:

About the same ballpark here. top reports 250 M of RES. According to pmap [pid here], basically all of that is an anonymous mapping:

[...]
000056318e682000 251236K rw---   [ anon ]
[...]
 total           262428K

When I run the program (with the loop) under valgrind/massif and interrupt it with ctrl-c, the tail of the output is:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 83  3,989,070,447    1,988,235,560    1,988,179,757        55,803            0
100.00% (1,988,179,757B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.94% (1,987,007,944B) 0x116E2B: alloc::alloc::alloc (alloc.rs:81)
| ->99.94% (1,987,007,944B) 0x116C01: <alloc::alloc::Global as core::alloc::Alloc>::alloc (alloc.rs:169)
|   ->99.94% (1,987,007,944B) 0x1179FD: alloc::raw_vec::RawVec<T,A>::allocate_in (raw_vec.rs:88)
|     ->99.94% (1,987,007,944B) 0x117715: alloc::raw_vec::RawVec<T>::with_capacity (raw_vec.rs:140)
|       ->99.94% (1,987,007,944B) 0x115573: alloc::vec::Vec<T>::with_capacity (vec.rs:355)
|         ->99.94% (1,987,007,944B) 0x118B10: alloc::slice::hack::to_vec (slice.rs:158)
|           ->99.94% (1,987,007,944B) 0x116494: alloc::slice::<impl [T]>::to_vec (slice.rs:394)
|             ->99.89% (1,986,000,000B) 0x1105FF: sdfsf::main::{{closure}} (main.rs:16)
|             | ->99.89% (1,986,000,000B) 0x111B13: core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (iterator.rs:652)
|             |   ->99.89% (1,986,000,000B) 0x111A23: core::iter::traits::iterator::Iterator::fold::ok::{{closure}} (iterator.rs:1900)
|             |     ->99.89% (1,986,000,000B) 0x110408: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1776)
|             |       ->99.89% (1,986,000,000B) 0x1102A7: core::iter::traits::iterator::Iterator::fold (iterator.rs:1903)
|             |         ->99.89% (1,986,000,000B) 0x110354: core::iter::traits::iterator::Iterator::for_each (iterator.rs:655)
|             |           ->99.89% (1,986,000,000B) 0x111818: sdfsf::main (main.rs:13)
|             |             ->99.89% (1,986,000,000B) 0x10E7FF: std::rt::lang_start::{{closure}} (rt.rs:67)
|             |               ->99.89% (1,986,000,000B) 0x11A7F2: {{closure}} (rt.rs:52)
|             |                 ->99.89% (1,986,000,000B) 0x11A7F2: _ZN3std9panicking3try7do_call17h0a62c452e3ab254eE.llvm.7254327271285625394 (panicking.rs:305)
|             |                   ->99.89% (1,986,000,000B) 0x1226A6: __rust_maybe_catch_panic (lib.rs:86)
|             |                     ->99.89% (1,986,000,000B) 0x11BF86: try<i32,closure-0> (panicking.rs:281)
|             |                       ->99.89% (1,986,000,000B) 0x11BF86: catch_unwind<closure-0,i32> (panic.rs:394)
|             |                         ->99.89% (1,986,000,000B) 0x11BF86: std::rt::lang_start_internal (rt.rs:51)
|             |                           ->99.89% (1,986,000,000B) 0x10E7D8: std::rt::lang_start (rt.rs:67)
|             |                             ->99.89% (1,986,000,000B) 0x1119B9: main (in /tmp/sdfsf/target/debug/sdfsf)
|             |                               
|             ->00.05% (1,007,944B) in 1+ places, all below ms_print's threshold (01.00%)
|             
->00.06% (1,171,813B) in 1+ places, all below ms_print's threshold (01.00%)

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 84 206,392,909,409        1,002,216        1,001,997           219            0

So, according to this... most of the memory really was freed in the end, but at the same time wasn't.

Running the program under strace (I added a println!("counting") before let mut count = 0) ends with:

brk(0x55f030ea7000)                     = 0x55f030ea7000
write(1, "counting\n", 9counting
)               = 9
munmap(0x7fb54bd3c000, 1003520)         = 0
brk(0x55eff6ccf000)                     = 0x55eff6ccf000
brk(0x55efd8464000)                     = 0x55efd8464000
write(1, "count 2000\n", 11count 2000
)            = 11

The munmap frees 980 KiB of memory, so must be something else. The brk frees lots of memory:

brk(0x55f030ea7000)                     = 0x55f030ea7000
brk(0x55eff6ccf000)                     = 0x55eff6ccf000
[The difference between the previous two numbers is 929 MiB]
brk(0x55efd8464000)                     = 0x55efd8464000
[The difference between the previous two numbers is 488 MiB]

So apparently this frees about 1.5 GiB of memory which is a large part of what is supposedly freed, but the residential size is still large. Perhaps the kernel does not actually free memory that is freed with brk? I don't know.