gchilders / msieve_nfsathome

3 stars 5 forks source link

Memory leak in Square Root #3

Open NyanCatTW1 opened 2 years ago

NyanCatTW1 commented 2 years ago

There are consistent memory leak during Square Root process, making the memory usage considerably higher than old version. (160 MB on master vs >8 GB on try_openmp) Reproduction can be as simple as msieve -nc3 -i N.txt -v Files for reproduction (2.0 GB): https://mega.nz/file/TkhGRbbL#2sotY_6x93Owxwpckoihdy0XuaYmcomkmD5YispwVNc

I will be running a git bisect in 2022-08-29 9 AM (GMT+8) if it wasn't fixed then.

gchilders commented 2 years ago

I just reverted the commit that I think is likely causing the issue. Did that fix it?

NyanCatTW1 commented 2 years ago

That failed. The memory usage started going up again after multiplying 2425308 relations

NyanCatTW1 commented 2 years ago
➜  /home/nyancat/Tools/msieve_nfsathome git:(75ff92a) ✗ git bisect bad                                           
75ff92a5c72f572c03ae3b7e201e2b92ec121837 is the first bad commit
commit 75ff92a5c72f572c03ae3b7e201e2b92ec121837
Author: Greg Childers <childers@dt-login02.delta.internal.ncsa.edu>
Date:   Sun Aug 14 12:52:27 2022 -0500

    Add OpenMP support for sqrt

 Makefile           |  4 ++++
 common/driver.c    | 10 ++++++++++
 gnfs/relation.c    |  4 +++-
 gnfs/sqrt/sqrt.c   |  1 +
 gnfs/sqrt/sqrt_a.c | 24 +++++++++++++++++++-----
 include/msieve.h   |  4 ++++
 6 files changed, 41 insertions(+), 6 deletions(-)

Seems like the leak goes all the way back to the day sqrt became multithreaded

I'll mess with the code a bit to see what might be causing the leak (Everything below is tested on commit 75ff92a5c72f572c03ae3b7e201e2b92ec121837) OMP=0: Good OMP=1 with every #pragma in sqrt_a.c removed: Good

    else {
        /* recursively compute the product of the first
           half and the last half of the relations */

        uint32 mid = (index1 + index2) / 2;
#pragma omp parallel
#pragma omp single
        {
#pragma omp task
        multiply_relations(prodinfo, index1, mid, &prod1);
#pragma omp task
        multiply_relations(prodinfo, mid + 1, index2, &prod2);
#pragma omp taskwait
        }
    }

Simply having these #pragma causes the leak

gchilders commented 2 years ago

GMP calls do reallocations as needed internally. Looks like there's still an issue with malloc and OpenMP as described in this thread from over a decade ago. https://gmplib.org/list-archives/gmp-discuss/2009-July/003852.html If you are so inclined, you could try using Hoard to see if it resolves the issue.

On the other hand, the real issue is that it took 11 dependencies to find the factors. That should (almost) never happen and is surely a bug. I'll look at this more tomorrow, but you can try redoing both LA and sqrt with the latest commit to see if it still requires so many dependencies.

NyanCatTW1 commented 2 years ago

Thanks for the info. I'll rerun everything starting from filtering on latest and see how that goes

NyanCatTW1 commented 2 years ago

Ran Valgrind on latest. Seems like there are two main kinds of leaks: 1.

==21437== 84,948,160 bytes in 482,660 blocks are possibly lost in loss record 264 of 264
==21437==    at 0x4838751: malloc (vg_replace_malloc.c:307)
==21437==    by 0x63557C7: ___kmp_allocate (in /usr/lib64/libomp.so)
==21437==    by 0x63DD825: __ompt_lw_taskteam_link(ompt_lw_taskteam_s*, kmp_info*, int, bool) (in /usr/lib64/libomp.so)
==21437==    by 0x636DF07: __kmp_fork_call (in /usr/lib64/libomp.so)
==21437==    by 0x63C8DD3: __kmp_GOMP_fork_call(ident*, int, unsigned int, unsigned int, void (*)(void*), void (*)(int*, int*, ...), int, ...) (in /usr/lib64/libomp.so)
==21437==    by 0x63CE089: GOMP_parallel@@VERSION (in /usr/lib64/libomp.so)
==21437==    by 0x430C84: mpz_poly_mul.constprop.0 (sqrt_a.c:170)
==21437==    by 0x4314C0: multiply_relations (sqrt_a.c:356)
==21437==    by 0x638F946: __kmp_invoke_task(int, kmp_task*, kmp_taskdata*) (in /usr/lib64/libomp.so)
==21437==    by 0x638FB95: __kmpc_omp_task (in /usr/lib64/libomp.so)
==21437==    by 0x63CDA94: GOMP_task@@VERSION (in /usr/lib64/libomp.so)
==21437==    by 0x4303C9: multiply_relations._omp_fn.0 (sqrt_a.c:347)

2.

==21437== 30,783,104 bytes in 174,904 blocks are possibly lost in loss record 257 of 264
==21437==    at 0x4838751: malloc (vg_replace_malloc.c:307)
==21437==    by 0x63557C7: ___kmp_allocate (in /usr/lib64/libomp.so)
==21437==    by 0x63DD825: __ompt_lw_taskteam_link(ompt_lw_taskteam_s*, kmp_info*, int, bool) (in /usr/lib64/libomp.so)
==21437==    by 0x636DF07: __kmp_fork_call (in /usr/lib64/libomp.so)
==21437==    by 0x63C8DD3: __kmp_GOMP_fork_call(ident*, int, unsigned int, unsigned int, void (*)(void*), void (*)(int*, int*, ...), int, ...) (in /usr/lib64/libomp.so)
==21437==    by 0x63CE089: GOMP_parallel@@VERSION (in /usr/lib64/libomp.so)
==21437==    by 0x4314A7: multiply_relations (sqrt_a.c:344)
==21437==    by 0x638F946: __kmp_invoke_task(int, kmp_task*, kmp_taskdata*) (in /usr/lib64/libomp.so)
==21437==    by 0x638FB95: __kmpc_omp_task (in /usr/lib64/libomp.so)
==21437==    by 0x63CDA94: GOMP_task@@VERSION (in /usr/lib64/libomp.so)
==21437==    by 0x4303C9: multiply_relations._omp_fn.0 (sqrt_a.c:347)
==21437==    by 0x63CE0AD: GOMP_parallel@@VERSION (in /usr/lib64/libomp.so)

I'll be honest, that did not provide a clue on where the leak came from...

NyanCatTW1 commented 2 years ago

LD_PRELOAD=/home/nyancat/bin/libhoard.so msieve -nc3 -i N.txt -v -t 6 didn't help. In fact, the memory usage became 1.5x, which is odd I'll try linking it in during compilation after LA finishes That didn't work either. Looks like Hoard can't save me this time.

I've had a way better luck on the new run and had it finished on dependency 1. This led to an...interesting discovery. It took 65 seconds on try_openmp and 80 seconds on master. Multithread only won by 20%, which makes me wonder whether we should just remove OMP from Square Root.

Okay...what? Removing the #pragma actually speed it up by another 10%? That's new.

gchilders commented 2 years ago

I reran your postprocessing twice and another larger number once and got the factors on the first or second dependency each time, so reverting that commit fixed the corruption that was causing the need for many dependencies. Now I just have to figure out what is wrong with that commit...

NyanCatTW1 commented 2 years ago

Even with the cycle issue fixed and assume that each dependency has a 2/3 chance of yielding a non-trivial factor, there is still a 1/3^5 =1/243 chance that it would exhaust 16 GB of memory. I guess a compromise could be to add an build/runtime option to enable/disable OpenMP specifically during Square Root, otherwise I risk triggering a OoM every time I run square root.

gchilders commented 2 years ago

Another option would be to run msieve separately for each dependency until the number is fully factored.

./msieve -nc3 1,1 -t 4 if number isn't fully factored, ./msieve -nc3 2,2 -t 4 and so on until the number is fully factored.

NyanCatTW1 commented 2 years ago
  1. The memory usage, just per dependency, is 16x compared to master. It is possible that the leak on larger numbers would exhaust the RAM before even one dependency can finish
  2. That'll be tough to integrate with YAFU, as it uses the msieve library (nfs_find_factors)