gmp memory leak - Githubissues

jkyang92 commented 3 years ago

The following code leaks gmp memory:

R = ZZ[x,y,z];
I = ideal(9*x^6*y^2*z^2-8*x^2*y^3*z^5,3*x^8*z^2-x^5*y^2*z^3,x^5*y^5-3*x^2*y^6*z^2);
mingens gb I;
I = null;
R = null;
collectGarbage();
collectGarbage();
collectGarbage();
collectGarbage();

valgrind gives the following stack trace

==32340== 672 bytes in 84 blocks are definitely lost in loss record 5 of 15
==32340==    at 0x48397EF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==32340==    by 0x5768999: __gmp_default_allocate (in /usr/lib64/libgmp.so.10.4.1)
==32340==    by 0x577DEEB: __gmpz_realloc (in /usr/lib64/libgmp.so.10.4.1)
==32340==    by 0x5777231: __gmpz_gcdext (in /usr/lib64/libgmp.so.10.4.1)
==32340==    by 0x44454D: GBRing::gbvector_combine_lead_terms_ZZ(FreeModule const*, FreeModule const*, gbvector const*, gbvector const*, gbvector const*, gbvector const*, gbvector*&, gbvector*&) (gbring.cpp:1199)
==32340==    by 0x42E7E4: gbA::compute_s_pair(gbA::spair*) (gb-default.cpp:1303)
==32340==    by 0x431A8A: gbA::reduce_ZZ(gbA::spair*) (gb-default.cpp:1436)
==32340==    by 0x4321D8: reduceit (gb-default.cpp:1563)
==32340==    by 0x4321D8: gbA::process_spair(gbA::spair*) (gb-default.cpp:2281)
==32340==    by 0x43358F: gbA::do_computation() (gb-default.cpp:2500)
==32340==    by 0x433ADF: gbA::start_computation() (gb-default.cpp:2574)
==32340==    by 0x4016E4: GBProxy::start_computation() (comp-gb-proxy.hpp:55)
==32340==    by 0x50EF43: rawStartComputation (groebner.cpp:621)

I've attached the full valgrind output. You can also verify that it's not a false positive by running the code in a loop and seeing that it does consume an increasing amount of memory.

gmpleak.log

This is not the only gmp related memory leak that I've seen. at least one of the tests from NormalToricVarieties also causes us to leak gmp memory. I suspect that what's happening while the struct underlying mpz_t itself is allocated on the GC heap, the limbs aren't always.

DanGrayson commented 3 years ago

@mikestillman -- you may want to handle this

mikestillman commented 3 years ago

@DanGrayson yes, I will. @jkyang92 Jay, thanks!

Also, before we stopped using GC to allocate all memory, I probably had a few spots where I allocated something, then forgot or didn't bother to free it. But now that is a leak. @jkyang92 if you find other cases, please let me know! (But i'll use valgrind too, via your code/pull request, myself too).

jkyang92 commented 3 years ago

Here's a valgrind log after running the tests from NormalToricVarieties (directly, not via check). Mostly they seem to be in RingQQ::fraction from aring-glue.hpp

ntvleak.log

mahrud commented 3 years ago

Is the leak in https://github.com/Macaulay2/M2/issues/1728 related to this as well?

jkyang92 commented 3 years ago

Using the code from pull request #1995, I got some better logs from ctest -T memcheck. Here are some more leaks and errors:

unit-tests:ARingGFGivaroGivaro.create: MemoryChecker.27.log unit-tests:ARingGFGivaroGivaro.arithmetic: MemoryChecker.29.log unit-tests:ARingQQGMP.display: MemoryChecker.34.log unit-tests:FreeAlgebra.quotientArithmetic: MemoryChecker.72.log unit-tests:RingZZmod101.negate: MemoryChecker.100.log

There's a bunch of other RingZZmodn tests that fail, but I think they are the same root cause.

Two other observations in my testing:

Shouldn't mpz_reallocate_limbs use GC_MALLOC_ATOMIC?
If you see an valgrind error about uninitialized values with a source of make_pair<bool,int>, this is spurious, and again is related to the fact that libgc reads uninitialized values. Unfortunately, due to the nature of this error, I can't figure out a good way to suppress it.

DanGrayson commented 3 years ago

Shouldn't mpz_reallocate_limbs use GC_MALLOC_ATOMIC?

Yes, that would be an improvement.

d-torrance commented 3 years ago

1. Shouldn't `mpz_reallocate_limbs` use `GC_MALLOC_ATOMIC`?

I've noticed a few segfaults related to mpz_reallocate_limbs (#1429, #1564, #1577, #1578) -- perhaps this would help those?

jkyang92 commented 3 years ago

@d-torrance I'd be really surprised if it made a difference, GC_MALLOC_ATOMIC just tells the garbage collector that the newly allocated memory cannot contain pointers, and so tells it not to bother scanning it for pointers.

On the other hand, those stack traces look suspiciously similar to weird issues I've been having with valgrind that I assumed were just spurious faults related to libgc. In particular, see point 2 from my previous comment. But that would require that GC_MALLOC was returning gibberish. I suppose this could occur if we are writing past the end of some array, but I have no idea how to figure that out.

jkyang92 commented 3 years ago

@d-torrance I did some testing, GC_MALLOC_ATOMIC actually does seem to help, at least for #1564. Incidentally, GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE might be even better here, since I think we can guarantee that there is always a pointer to the beginning of the allocated memory.

jkyang92 commented 3 years ago

@mikestillman As far as I can tell, all of the examples in this bug report still leak. Also, I'm curious why you chose to copy the limbs into the gc heap instead of using finalizers. I can't imagine that a finalizer is more expensive than an allocation+copy.

mikestillman commented 3 years ago

@jkyang92 The number of these elements we can have active can be in the millions. I was afraid (but didn't measure it) that finalization for that many elements would be quite slow... But maybe that is incorrect. We could try it. There might be some possibility of memory leaks still: if the gmp struct is not in gc allocated memory, and so we don't finallize it, and we do not clear that element, then the limbs will leak.

mikestillman commented 3 years ago

@jkyang92 OK, I'll take a look at these.

jkyang92 commented 3 years ago

@mikestillman I did some testing with the following (highly artificial) test, and the results are mixed. So if the allocations are small, then using finalizers seems significantly slower, on the other hand if the allocations are large, then using finalizers is actually faster (likely due to lower GC heap usage).

#define GC_THREADS 1
#define GC_PTHREADS 1

#include <gc/gc.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define USE_FINALIZER 1
#define ALLOC_SIZE 10
#define BLOCK_SIZE 1000
#define LOOP_COUNT 100000

struct test{
    void *memory;
};

void free_func(void * v, void *cd){
    struct test *t = v;
    free(t->memory);
    t->memory = NULL;
}

struct test * allocate(){
#ifdef USE_FINALIZER
    struct test *t = GC_MALLOC_ATOMIC(sizeof(*t));
    t->memory = malloc(ALLOC_SIZE);
    GC_REGISTER_FINALIZER(t,free_func,NULL,NULL,NULL);
#else
    struct test *t = GC_MALLOC(sizeof(*t));
    t->memory = GC_MALLOC_ATOMIC(ALLOC_SIZE);
#endif
    return t;
};

int main(){
    GC_INIT();
    for(int i=0;i<LOOP_COUNT;i++){
        struct test **block = GC_MALLOC(BLOCK_SIZE*sizeof(block[0]));
        for(int j=0;j<BLOCK_SIZE;j++){
            block[j] = allocate();
        }
        memset(block,0,BLOCK_SIZE*sizeof(block[0]));
    }
    GC_gcollect();
}

This doesn't perfectly replicate the GMP situation since I don't do any work on the memory, and I don't do the copying needed to use the GC heap in that case.

Macaulay2 / M2

gmp memory leak #1938