ChenhanYu / hmlp

High-Performance Machine Learning Primitives
Other
10 stars 8 forks source link

Memory Usage and Possible Leak #43

Open wlruys opened 5 years ago

wlruys commented 5 years ago

Hi,

We've noticed that when performing repeated calls to evaluate that memory usage continues to grow. This is limiting our ability to use GOFMM with an eigensolver & clustering, after ~50-100 iterations it uses 20+ GB. (Sometimes over 60, depending on the dataset size) Not sure if this is related to issue #37

In https://github.com/dialecticDolt/hmlp/tree/pythondevel we tried adding deconstructors to Data (In case they didn't inherit properly from vector and ReadWrite) but this didn't change the behavior that we were seeing.

This can be seen with a simple test with example/distributed_fast_matvec_solver.cpp

\\DistData on Stack
for(int rep=0; rep<50; rep++){
    DistData<RIDS, STAR, T> u1 = mpigofmm::Evaluate( tree1, w1 );
}
\\With explicit deallocation
DistData<RIDS, STAR, T>* u2;
for(int rep=0; rep<50; rep++){
    u2 = mpigofmm::Evaluate_Pointer(tree1, w1);
    delete(u2);
}

Where mpigofmm::Evaluate_Pointer(tree, w1) is a version of Evaluate that allocated potentials by calling new and returns a pointer to it.

Running valgrind over this type of example shows lost memory, but not to this magnitude. The largest 'definitely lost' blocks are near the xgemm tasks and S2S Tasks.

Could you take a look at what the cause might be?

ChenhanYu commented 5 years ago

Indeed this seems to be a potential memory leak. I will try to reproduce the problem and find the source. Does this potential leaking block the progress? Or in other words, how urgent is this issue?

ChenhanYu commented 5 years ago

I have tried to fix the memory leaking problem. This issue is due to a potential race condition in the S2S and L2L tasks. As a result, the memory was not destroyed correctly. Could you give it a try to see if the leaking problem goes away or becomes less problematic? Many thanks.

wlruys commented 5 years ago

Sorry for the two month delay. I ended up working on something else during the summer. On my end, it looks like the leaking problem is about the same (on repeated calls to evaluate) in the current develop branch to how it was.

ChenhanYu commented 5 years ago

Could you provide me the example to reproduce on the develop branch? It will be easier for me to look into the problem. Many thanks.

wlruys commented 5 years ago

I've pulled the current branch of develop and added an example/memory_test script here: https://github.com/dialecticDolt/hmlp/tree/develop (It is not a good kernel setup for compression but it shows the behavior well)

The memory profile that this produces on my workstation is given below. The blue line shows RAM usage in MB. The plateau at 15 seconds is when it switches to the other test function (and spends some time running nearest neighbors). We can see that memory usage is preserved out of scope.

memtest_out

ChenhanYu commented 5 years ago

Thanks, I will take a look.

ChenhanYu commented 5 years ago

OK I am able to reproduce the problem. I will spend some time over the weekend to see if there is an easy fix. If not, I will provide an ETA on fundamentally improving memory management. Thank you for filing this bug.

ChenhanYu commented 5 years ago

I have fixed the problem in the develop branch at least in the way that I couldn't reproduce in the memory_test. The problem is resulted from creating nested parallel GEMM tasks. Disable this feature will solve the memory leaking issue. I will restore the support of this feature when it is entirely fixed. I will also add the memory_test you provided to the example. Thank you for your contribution.