flatironinstitute / finufft

Non-uniform fast Fourier transform library of types 1,2,3 in dimensions 1,2,3
Other
294 stars 76 forks source link

Memory leak in finufft_setpts? #269

Closed aaron-shih closed 1 year ago

aaron-shih commented 1 year ago

Hi,

I appear to be encountering constant increasing memory usage when calling setpts repeatedly to change the nonuniform point coordinates for type 1 and type 2 plans. This occurs for me in both the C implementation and the Python wrapper.

Here's a trimmed down example from the documentation using the Python interface with a loop added. Viewing memory usage in something like htop I can see the memory usage rising.

    import numpy as np
    import finufft

    # set up parameters
    n_modes = (1000, 2000)
    n_pts = 100000

    # generate nonuniform points
    x = 2 * np.pi * np.random.uniform(size=n_pts)
    y = 2 * np.pi * np.random.uniform(size=n_pts)

    # initialize the plan
    plan = finufft.Plan(1, n_modes)

    for i in range(10000):
        # set the nonuniform points
        plan.setpts(x, y)
        if i%100==0:
            print(i)
lu1and10 commented 1 year ago

Hi,

I appear to be encountering constant increasing memory usage when calling setpts repeatedly to change the nonuniform point coordinates for type 1 and type 2 plans. This occurs for me in both the C implementation and the Python wrapper.

Here's a trimmed down example from the documentation using the Python interface with a loop added. Viewing memory usage in something like htop I can see the memory usage rising.

    import numpy as np
    import finufft

    # set up parameters
    n_modes = (1000, 2000)
    n_pts = 100000

    # generate nonuniform points
    x = 2 * np.pi * np.random.uniform(size=n_pts)
    y = 2 * np.pi * np.random.uniform(size=n_pts)

    # initialize the plan
    plan = finufft.Plan(1, n_modes)

    for i in range(10000):
        # set the nonuniform points
        plan.setpts(x, y)
        if i%100==0:
            print(i)

I guess it's because the pre-execute sort memory allocation, it's freed when calling destroy plan. @ahbarnett , should we add if(p->sortIndices) free(p->sortIndices); before line 921?

ahbarnett commented 1 year ago

Dear Aaron & Libin, Thank-you for catching this! I guess no-one has done enough repeated setpts to notice before! Yes, Libin, I think this is exactly the right fix (since we don't know it is the same size as before, so we have to malloc each time).... could you bring that into master? Thanks, Alex

On Fri, Jun 9, 2023 at 5:21 PM Libin Lu @.***> wrote:

Hi,

I appear to be encountering constant increasing memory usage when calling setpts repeatedly to change the nonuniform point coordinates for type 1 and type 2 plans. This occurs for me in both the C implementation and the Python wrapper.

Here's a trimmed down example from the documentation using the Python interface with a loop added. Viewing memory usage in something like htop I can see the memory usage rising.

import numpy as np
import finufft

# set up parameters
n_modes = (1000, 2000)
n_pts = 100000

# generate nonuniform points
x = 2 * np.pi * np.random.uniform(size=n_pts)
y = 2 * np.pi * np.random.uniform(size=n_pts)

# initialize the plan
plan = finufft.Plan(1, n_modes)

for i in range(10000):
    # set the nonuniform points
    plan.setpts(x, y)
    if i%100==0:
        print(i)

I guess it's because the pre-execute sort memory allocation https://github.com/flatironinstitute/finufft/blob/4d06460a6c20f708a5db0e94e213214651538bc5/src/finufft.cpp#L921, it's freed when calling destroy plan. @ahbarnett https://github.com/ahbarnett , should we add if(p->sortIndices) free(p->sortIndices); before line 921?

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/finufft/issues/269#issuecomment-1585141078, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQKO37BJWWAQM4MIKLXKOHVRANCNFSM6AAAAAAZBC67PA . You are receiving this because you were mentioned.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

lu1and10 commented 1 year ago

Dear Aaron & Libin, Thank-you for catching this! I guess no-one has done enough repeated setpts to notice before! Yes, Libin, I think this is exactly the right fix (since we don't know it is the same size as before, so we have to malloc each time).... could you bring that into master? Thanks, Alex On Fri, Jun 9, 2023 at 5:21 PM Libin Lu @.> wrote: Hi, I appear to be encountering constant increasing memory usage when calling setpts repeatedly to change the nonuniform point coordinates for type 1 and type 2 plans. This occurs for me in both the C implementation and the Python wrapper. Here's a trimmed down example from the documentation using the Python interface with a loop added. Viewing memory usage in something like htop I can see the memory usage rising. import numpy as np import finufft # set up parameters n_modes = (1000, 2000) n_pts = 100000 # generate nonuniform points x = 2 np.pi np.random.uniform(size=n_pts) y = 2 np.pi np.random.uniform(size=n_pts) # initialize the plan plan = finufft.Plan(1, n_modes) for i in range(10000): # set the nonuniform points plan.setpts(x, y) if i%100==0: print(i) I guess it's because the pre-execute sort memory allocation https://github.com/flatironinstitute/finufft/blob/4d06460a6c20f708a5db0e94e213214651538bc5/src/finufft.cpp#L921, it's freed when calling destroy plan. @ahbarnett https://github.com/ahbarnett , should we add if(p->sortIndices) free(p->sortIndices); before line 921? — Reply to this email directly, view it on GitHub <#269 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQKO37BJWWAQM4MIKLXKOHVRANCNFSM6AAAAAAZBC67PA . You are receiving this because you were mentioned.Message ID: @.> -- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

Thanks, Alex. Fixed in commit f3a2af9a1ba28538fc1f03448b9e79b91891da39. @aaron-shih could you try and confirm that it works on you side without memory leak? Thanks!

ahbarnett commented 1 year ago

Thanks Libin. Since I don't use python much, I am unable to get my local python installation to use my locally-compiled .so, so I can't test the fix yet :) Tried the obvious things, and the weird thing is when I delete my local libfinufft.so, python import finufft fails, but apparently that's not the library being used after all... :) Cheers, Alex

On Fri, Jun 9, 2023 at 9:59 PM Libin Lu @.***> wrote:

Dear Aaron & Libin, Thank-you for catching this! I guess no-one has done enough repeated setpts to notice before! Yes, Libin, I think this is exactly the right fix (since we don't know it is the same size as before, so we have to malloc each time).... could you bring that into master? Thanks, Alex … <#m7952292279623320040> On Fri, Jun 9, 2023 at 5:21 PM Libin Lu @.> wrote: Hi, I appear to be encountering constant increasing memory usage when calling setpts repeatedly to change the nonuniform point coordinates for type 1 and type 2 plans. This occurs for me in both the C implementation and the Python wrapper. Here's a trimmed down example from the documentation using the Python interface with a loop added. Viewing memory usage in something like htop I can see the memory usage rising. import numpy as np import finufft # set up parameters n_modes = (1000, 2000) n_pts = 100000 # generate nonuniform points x = 2 np.pi np.random.uniform(size=n_pts) y = 2 np.pi np.random.uniform(size=n_pts) # initialize the plan plan = finufft.Plan(1, n_modes) for i in range(10000): # set the nonuniform points plan.setpts(x, y) if i%100==0: print(i) I guess it's because the pre-execute sort memory allocation https://github.com/flatironinstitute/finufft/blob/4d06460a6c20f708a5db0e94e213214651538bc5/src/finufft.cpp#L921 https://github.com/flatironinstitute/finufft/blob/4d06460a6c20f708a5db0e94e213214651538bc5/src/finufft.cpp#L921, it's freed when calling destroy plan. @ahbarnett https://github.com/ahbarnett https://github.com/ahbarnett https://github.com/ahbarnett , should we add if(p->sortIndices) free(p->sortIndices); before line 921? — Reply to this email directly, view it on GitHub <#269 (comment) https://github.com/flatironinstitute/finufft/issues/269#issuecomment-1585141078>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQKO37BJWWAQM4MIKLXKOHVRANCNFSM6AAAAAAZBC67PA https://github.com/notifications/unsubscribe-auth/ACNZRSQKO37BJWWAQM4MIKLXKOHVRANCNFSM6AAAAAAZBC67PA . You are receiving this because you were mentioned.Message ID: @.> -- *-------------------------------------------------------------------^`^._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

Thanks, Alex. Fixed in commit f3a2af9 https://github.com/flatironinstitute/finufft/commit/f3a2af9a1ba28538fc1f03448b9e79b91891da39 . @aaron-shih https://github.com/aaron-shih could you try and confirm that it works on you side without memory leak? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/finufft/issues/269#issuecomment-1585422537, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSR7LQLD23IXYSLT23LXKPIHTANCNFSM6AAAAAAZBC67PA . You are receiving this because you were mentioned.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

lu1and10 commented 1 year ago

Thanks Libin. Since I don't use python much, I am unable to get my local python installation to use my locally-compiled .so, so I can't test the fix yet :) Tried the obvious things, and the weird thing is when I delete my local libfinufft.so, python import finufft fails, but apparently that's not the library being used after all... :) Cheers, Alex On Fri, Jun 9, 2023 at 9:59 PM Libin Lu @.> wrote: Dear Aaron & Libin, Thank-you for catching this! I guess no-one has done enough repeated setpts to notice before! Yes, Libin, I think this is exactly the right fix (since we don't know it is the same size as before, so we have to malloc each time).... could you bring that into master? Thanks, Alex … <#m7952292279623320040> On Fri, Jun 9, 2023 at 5:21 PM Libin Lu @.> wrote: Hi, I appear to be encountering constant increasing memory usage when calling setpts repeatedly to change the nonuniform point coordinates for type 1 and type 2 plans. This occurs for me in both the C implementation and the Python wrapper. Here's a trimmed down example from the documentation using the Python interface with a loop added. Viewing memory usage in something like htop I can see the memory usage rising. import numpy as np import finufft # set up parameters n_modes = (1000, 2000) n_pts = 100000 # generate nonuniform points x = 2 np.pi np.random.uniform(size=n_pts) y = 2 np.pi np.random.uniform(size=n_pts) # initialize the plan plan = finufft.Plan(1, nmodes) for i in range(10000): # set the nonuniform points plan.setpts(x, y) if i%100==0: print(i) I guess it's because the pre-execute sort memory allocation https://github.com/flatironinstitute/finufft/blob/4d06460a6c20f708a5db0e94e213214651538bc5/src/finufft.cpp#L921 https://github.com/flatironinstitute/finufft/blob/4d06460a6c20f708a5db0e94e213214651538bc5/src/finufft.cpp#L921, it's freed when calling destroy plan. @ahbarnett https://github.com/ahbarnett https://github.com/ahbarnett https://github.com/ahbarnett , should we add if(p->sortIndices) free(p->sortIndices); before line 921? — Reply to this email directly, view it on GitHub <#269 (comment) <#269 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQKO37BJWWAQM4MIKLXKOHVRANCNFSM6AAAAAAZBC67PA https://github.com/notifications/unsubscribe-auth/ACNZRSQKO37BJWWAQM4MIKLXKOHVRANCNFSM6AAAAAAZBC67PA . You are receiving this because you were mentioned.Message ID: @.> -- -------------------------------------------------------------------^`^..~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942 Thanks, Alex. Fixed in commit f3a2af9 <f3a2af9> . @aaron-shih https://github.com/aaron-shih could you try and confirm that it works on you side without memory leak? Thanks! — Reply to this email directly, view it on GitHub <#269 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSR7LQLD23IXYSLT23LXKPIHTANCNFSM6AAAAAAZBC67PA . You are receiving this because you were mentioned.Message ID: **@.**> -- -------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

I just made another commit, I only freed for type3(missed the type1,2 branch), now it frees sort indices for type 1,2 and 3.

I think using the gnu makefile system make python, the installed python interface will still use the local lib, i.e. finufft/lib/libfinufft.so(it's the same one generated by make lib). Deleting the local libfinufft.so will make python import finufft fails. If you pull the latest commit, and make the local lib again, the memory should not grow. I tested C++ and python, now the memory doesn't grow.

ahbarnett commented 1 year ago

Ok, that explains it - Aaron's python code doesn't test type 3 :) Works great now. I will add to CHANGELOG. Closed by fix in 8792a74

aaron-shih commented 1 year ago

Thanks so much for addressing the memory leak so promptly! I see the fix was to free the sorted indices after each reallocation. If I know that I have the same number of nonuniform points each time, would it be possible to save time by changing the sorted indices in place instead of reallocating?

ahbarnett commented 1 year ago

malloc will be much faster than sort, so I wouldn't worry... But you know how to test it so if you find a significant speed hit, let us know. Best, Alex

On Thu, Jun 15, 2023 at 7:00 PM Aaron Shih @.***> wrote:

Thanks so much for addressing the memory leak so promptly! I see the fix was to free the sorted indices after each reallocation. If I know that I have the same number of nonuniform points each time, would it be possible to save time by changing the sorted indices in place instead of reallocating?

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/finufft/issues/269#issuecomment-1593824112, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQFNG42PNOFTJCOLKDXLOHZ7ANCNFSM6AAAAAAZBC67PA . You are receiving this because you modified the open/close state.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

aaron-shih commented 1 year ago

Hi,

Apologies for bringing this issue up again, but it seems like Type 3 still has the memory leak in setpts. I have tried going into finufft.cpp and adding if (p->*) free(p->*); before every malloc I can find but it doesn't seem to decrease the runaway memory usage at all.

import numpy as np
import finufft

# set up parameters
n_modes = 10000
n_pts = 100000
ndim = 3

# generate nonuniform points
x = 2 * np.pi * np.random.uniform(size=n_pts)
y = 2 * np.pi * np.random.uniform(size=n_pts)
z = 2 * np.pi * np.random.uniform(size=n_pts)

kx = 100 * np.random.uniform(size=n_modes)
ky = 100 * np.random.uniform(size=n_modes)
kz = 100 * np.random.uniform(size=n_modes)

# initialize the plan
plan = finufft.Plan(3, ndim)

for i in range(10000):
    # set the nonuniform points
    plan.setpts(x, y,z,kx,ky,kz)
    if i%100==0:
        print(i)
lu1and10 commented 1 year ago

Hi,

Apologies for bringing this issue up again, but it seems like Type 3 still has the memory leak in setpts. I have tried going into finufft.cpp and adding if (p->*) free(p->*); before every malloc I can find but it doesn't seem to decrease the runaway memory usage at all.

import numpy as np
import finufft

# set up parameters
n_modes = 10000
n_pts = 100000
ndim = 3

# generate nonuniform points
x = 2 * np.pi * np.random.uniform(size=n_pts)
y = 2 * np.pi * np.random.uniform(size=n_pts)
z = 2 * np.pi * np.random.uniform(size=n_pts)

kx = 100 * np.random.uniform(size=n_modes)
ky = 100 * np.random.uniform(size=n_modes)
kz = 100 * np.random.uniform(size=n_modes)

# initialize the plan
plan = finufft.Plan(3, ndim)

for i in range(10000):
    # set the nonuniform points
    plan.setpts(x, y,z,kx,ky,kz)
    if i%100==0:
        print(i)

Hi Aaron, There seems more memory to clean if repeatedly calling setpts, let me know if the following patch works for you.

diff --git a/src/finufft.cpp b/src/finufft.cpp
index 32923eb7..489573bb 100644
--- a/src/finufft.cpp
+++ b/src/finufft.cpp
@@ -821,6 +821,7 @@ int FINUFFT_SETPTS(FINUFFT_PLAN p, BIGINT nj, FLT* xj, FLT* yj, FLT* zj,
     }
     p->fwBatch = FFTW_ALLOC_CPX(p->nf * p->batchSize);    // maybe big workspace
     // (note FFTW_ALLOC is not needed over malloc, but matches its type)
+    if(p->CpBatch) free(p->CpBatch);
     p->CpBatch = (CPX*)malloc(sizeof(CPX) * nj*p->batchSize);  // batch c' work
     if (p->opts.debug) printf("[%s t3] widcen, batch %.2fGB alloc:\t%.3g s\n", __func__, (double)1E-09*sizeof(CPX)*(p->nf+nj)*p->batchSize, timer.elapsedsec());
     if(!p->fwBatch || !p->CpBatch) {
@@ -830,13 +831,19 @@ int FINUFFT_SETPTS(FINUFFT_PLAN p, BIGINT nj, FLT* xj, FLT* yj, FLT* zj,
     //printf("fwbatch, cpbatch ptrs: %llx %llx\n",p->fwBatch,p->CpBatch);

     // alloc rescaled NU src pts x'_j (in X etc), rescaled NU targ pts s'_k ...
+    if(p->X) free(p->X);
+    if(p->Sp) free(p->Sp);
     p->X = (FLT*)malloc(sizeof(FLT)*nj);
     p->Sp = (FLT*)malloc(sizeof(FLT)*nk);
     if (d>1) {
+      if(p->Y) free(p->Y);
+      if(p->Tp) free(p->Tp);
       p->Y = (FLT*)malloc(sizeof(FLT)*nj);
       p->Tp = (FLT*)malloc(sizeof(FLT)*nk);
     }
     if (d>2) {
+      if(p->Z) free(p->Z);
+      if(p->Up) free(p->Up);
       p->Z = (FLT*)malloc(sizeof(FLT)*nj);
       p->Up = (FLT*)malloc(sizeof(FLT)*nk);
     }
@@ -858,6 +865,7 @@ int FINUFFT_SETPTS(FINUFFT_PLAN p, BIGINT nj, FLT* xj, FLT* yj, FLT* zj,

     // set up prephase array...
     CPX imasign = (p->fftSign>=0) ? IMA : -IMA;             // +-i
+    if(p->prephase) free(p->prephase);
     p->prephase = (CPX*)malloc(sizeof(CPX)*nj);
     if (p->t3P.D1!=0.0 || p->t3P.D2!=0.0 || p->t3P.D3!=0.0) {
 #pragma omp parallel for num_threads(p->opts.nthreads) schedule(static)
@@ -885,6 +893,7 @@ int FINUFFT_SETPTS(FINUFFT_PLAN p, BIGINT nj, FLT* xj, FLT* yj, FLT* zj,

     // (old STEP 3a) Compute deconvolution post-factors array (per targ pt)...
     // (exploits that FT separates because kernel is prod of 1D funcs)
+    if(p->deconv) free(p->deconv);
     p->deconv = (CPX*)malloc(sizeof(CPX)*nk);
     FLT *phiHatk1 = (FLT*)malloc(sizeof(FLT)*nk);  // don't confuse w/ p->phiHat
     onedim_nuft_kernel(nk, p->Sp, phiHatk1, p->spopts);         // fill phiHat1
@@ -941,6 +950,7 @@ int FINUFFT_SETPTS(FINUFFT_PLAN p, BIGINT nj, FLT* xj, FLT* yj, FLT* zj,
     t2opts.spread_debug = max(0,p->opts.spread_debug-1);
     t2opts.showwarn = 0;                          // so don't see warnings 2x
     // (...could vary other t2opts here?)
+    if(p->innerT2plan) FINUFFT_DESTROY(p->innerT2plan);
     int ier = FINUFFT_MAKEPLAN(2, d, t2nmodes, p->fftSign, p->batchSize, p->tol,
                                &p->innerT2plan, &t2opts);
     if (ier>1) {     // if merely warning, still proceed
aaron-shih commented 1 year ago

Those changes seem to help a little. I think the memory usage is growing slower than before but is still growing.

lu1and10 commented 1 year ago

Those changes seem to help a little. I think the memory usage is growing slower than before but is still growing.

The following memory using FFTW_ALLOC_CPX should also be cleaned, could you check if adding the following is better?

@@ -819,8 +819,11 @@ int FINUFFT_SETPTS(FINUFFT_PLAN p, BIGINT nj, FLT* xj, FLT* yj, FLT* zj,
       fprintf(stderr, "[%s t3] fwBatch would be bigger than MAX_NF, not attempting malloc!\n",__func__);
       return ERR_MAXNALLOC;
     }
+    FFTW_FR(p->fwBatch);   // free the big FFTW (or t3 spread) working array
     p->fwBatch = FFTW_ALLOC_CPX(p->nf * p->batchSize);    // maybe big workspace
aaron-shih commented 1 year ago

Apologies for the late reply. That seems to have worked! Thank you so much!

ahbarnett commented 1 year ago

fixed by 72a8b9f