FEniCS / dolfinx

Next generation FEniCS problem solving environment
https://fenicsproject.org
GNU Lesser General Public License v3.0
776 stars 181 forks source link

Possible memory leak in `dolfinx.fem.Function.vector`? #2559

Closed francesco-ballarin closed 1 year ago

francesco-ballarin commented 1 year ago

I am fighting with https://github.com/FEniCS/dolfinx/pull/2552 to get my heavily parametrized multiphenicsx tests running without exceeding the maximum number of allowed communicators by mpich.

The current workaround I have https://github.com/multiphenics/multiphenicsx/commit/b5b972e295b3b38e88562e4139a69a52e21e0cb1 is to split the execution in multiple batches.

Still, while debugging this I noticed that if you run the following

import dolfinx.mesh
import mpi4py.MPI
import ufl

mesh = dolfinx.mesh.create_unit_square(mpi4py.MPI.COMM_WORLD, 4, 4)
scalar_element = ufl.FiniteElement("Lagrange", mesh.ufl_cell(), 1)
vector_element = ufl.VectorElement("Lagrange", mesh.ufl_cell(), 1)
mixed_element = ufl.MixedElement(scalar_element, scalar_element)
V = dolfinx.fem.FunctionSpace(mesh, vector_element)
f = dolfinx.fem.Function(V)
with f.vector.localForm() as local_form:
    pass

in dolfinx/dolfinx:nightly you get the following warning

[WARNING] yaksa: 1 leaked handle pool objects

which seem to suggest there is a memory leak.

Notice that:

This can surely be a false positive, and might not even be the only culprit in my case. Still, I thought it was worthwhile to report this upstream. I tried to debug this a bit further, but couldn't find any info on how to get a more verbose output from yaksa; I've tried running valgrind to compare the scalar_element vs vector_element runs, but I can't spot any relevant indication of where the issue may be among the tons of output it produces.

garth-wells commented 1 year ago

I think this is a petsc4py garbage collection issue. Changing the code slightly and adding a destroy at the end eliminates the warning.

import dolfinx.mesh
import mpi4py.MPI
import ufl

mesh = dolfinx.mesh.create_unit_square(mpi4py.MPI.COMM_WORLD, 4, 4)
scalar_element = ufl.FiniteElement("Lagrange", mesh.ufl_cell(), 1)
vector_element = ufl.VectorElement("Lagrange", mesh.ufl_cell(), 1)
mixed_element = ufl.MixedElement(scalar_element, scalar_element)
V = dolfinx.fem.FunctionSpace(mesh, vector_element)
f = dolfinx.fem.Function(V)
foo = f.vector
with foo.localForm() as local_form:
    pass
foo.destroy()

I've added some explicit clean up in https://github.com/FEniCS/dolfinx/tree/garth/petsc-gc. @francesco-ballarin could you test this branch?

francesco-ballarin commented 1 year ago

Thanks @garth-wells , I confirm I have adopted a strategy similar to #2560 in multiphenicsx, and now I do not have any [WARNING] yaksa: _ leaked handle pool objects left.

For future reference, just in case: I had to be extra careful whenever there were calls to getNestSubVecs of nest vecs or getNestSubMatrix of nest mats, and make sure to destroy the resulting subvecs/submats after using them. This does not seem to be needed for dolfinx CI, at least for now.

lindsayad commented 5 months ago

This was a top hit when I searched for "yaksa: 7 leaked handle pool objects". Do you guys have a methodical way for tracking down the origins of these leaks? I am used to doing everything from C and C++ in which case I can rely on valgrind, but I'm not as familiar with tracking potential leaks from python