Open uvilla opened 8 months ago
I tried to write a simple code to reproduce the issue. Unfortunately, I did not observe the MPI Comm duplication issue, but I can see some memory leakage.
from mpi4py import MPI
import dolfinx as dlx
import dolfinx.fem.petsc
import ufl
def run(Vh):
u = dlx.fem.Function(Vh)
v = ufl.TestFunction(Vh)
form = dlx.fem.form( ufl.inner(u,v)*ufl.dx )
vec = dlx.fem.petsc.assemble_vector(form)
if __name__=='__main__':
comm = MPI.COMM_WORLD
rank = comm.rank
mesh = dlx.mesh.create_unit_square(MPI.COMM_WORLD, 8, 8, dlx.mesh.CellType.quadrilateral)
V = dlx.fem.FunctionSpace(mesh, ("Lagrange", 1 ) )
for i in range(2**32-1):
if (i % 10 == 0) and (rank == 0): print(i)
I run this snippet using 8 processors. At the beginning the memory allocation was about ~300MB per process and after about 1 hour of running is ~3.5GB per process.
I'm running FeniCS through conda on an Intel MacOS laptop. (python-3.10, fenics-libdolfinx 0.7.0 h2cc1089_100
I've been running your script through docker (v0.7.3) for 100 % of the runtime (using 3 processes), and see no increase in memory usage:/
Could you try adding explicit usage of
import gc
# Add code here
for i in range(N):
# Define u, mesh etc.
del u, mesh
gc.collect()
@jorgensd : Thank you for your prompt reply.
I tried using the docker container as you recommended. The increase in memory is much less significant than using conda but still present.
For example (again with 8 processors) I observed an increase of a factor 8 in memory usage (monitored using the top command) between iteration 100 and 60,000 (the example above has been running for about 4 hours). Memory usage at iteration 100 was about 82MB per processor but it climbed to 665MB at iteration 60,000.
from mpi4py import MPI import dolfinx as dlx import dolfinx.fem.petsc import ufl
def run(Vh): u = dlx.fem.Function(Vh) v = ufl.TestFunction(Vh) form = dlx.fem.form( ufl.inner(u,v)*ufl.dx ) vec = dlx.fem.petsc.assemble_vector(form)
if name=='main': comm = MPI.COMM_WORLD rank = comm.rank
mesh = dlx.mesh.create_unit_square(MPI.COMM_WORLD, 8, 8, dlx.mesh.CellType.quadrilateral) V = dlx.fem.FunctionSpace(mesh, ("Lagrange", 1 ) ) for i in range(2**32-1): if (i % 10 == 0) and (rank == 0): print(i)
@uvilla I guess you forgot to add run(V)
to your loop?
With the following code, I do see a steady increase in memory usage:
from mpi4py import MPI
import dolfinx as dlx
import dolfinx.fem.petsc
import ufl
import gc
def run(Vh):
u = dlx.fem.Function(Vh)
v = ufl.TestFunction(Vh)
form = dlx.fem.form( ufl.inner(u,v)*ufl.dx )
vec = dlx.fem.petsc.assemble_vector(form)
del vec
del u
if __name__=='__main__':
comm = MPI.COMM_WORLD
rank = comm.rank
N = 2**32-1
mesh = dlx.mesh.create_unit_square(MPI.COMM_WORLD, 8, 8, dlx.mesh.CellType.quadrilateral)
V = dlx.fem.FunctionSpace(mesh, ("Lagrange", 1 ) )
for i in range(N):
run(V)
if (i % 1e2 == 0) and (rank == 0): print(f"{i}/{N}, {i/N*100:.2e}%", end="\r", flush=True)
gc.collect()
Starts at around 90 MB, after 10000 iterations it is at 155 MB and steadily increases.
@uvilla Changing your code to:
def run(Vh):
u = dlx.fem.Function(Vh)
v = ufl.TestFunction(Vh)
form = dlx.fem.form( ufl.inner(u,v)*ufl.dx )
vec = dlx.fem.petsc.assemble_vector(form)
vec.destroy()
or
from mpi4py import MPI
import dolfinx as dlx
import dolfinx.fem.petsc
import ufl
def run(Vh):
u = dlx.fem.Function(Vh)
v = ufl.TestFunction(Vh)
form = dlx.fem.form( ufl.inner(u,v)*ufl.dx )
vec = dolfinx.fem.Function(Vh)
dlx.fem.petsc.assemble_vector(vec.vector, form)
if __name__=='__main__':
comm = MPI.COMM_WORLD
rank = comm.rank
N = 100000
mesh = dlx.mesh.create_unit_square(MPI.COMM_WORLD, 8, 8, dlx.mesh.CellType.quadrilateral)
V = dlx.fem.FunctionSpace(mesh, ("Lagrange", 1 ) )
for i in range(N):
run(V)
if (i % 1e2 == 0) and (rank == 0): print(f"{i}/{N}, {i/N*100:.2e}%", end="\r", flush=True)
(as dolfinx.fem.Function destroys any associated PETSc vector at destruction).
The issue is that PETSc doesn't do garbage collection in Python automatically any longer: https://gitlab.com/petsc/petsc/-/issues/1309
@jorgensd
Thank you very much for your help with this.
I was not aware of that issue with PETSc Vec. This explains a lot of strange things happening in my code.
In hIPPYlib
, I am often dealing with Vector
that are not always associated to a Function Space, think e.g. pointwise observations.
I am wondering if it would be useful to implement a property dolfinx.la.Vector.vector
that creates the petsc4py wrapper and then delete it when the dolfinx.la.Vector
goes out of scope.
Thank you :)
The dolfinx.la.Vector
does not have a .vector
property that interfaces with PETSc
. It is only the dolfinx.fem.Function
that has a .vector
property that creates a wrapper around its data compatible with petsc4py:
https://github.com/FEniCS/dolfinx/blob/main/python/dolfinx/fem/function.py#L475-L491
This is in fact what I am proposing.
dolfinx.la.Vector
should expose a property .vector
implemented as the current implementation of dolfinx.fem.Function
.
Then, to preserve backward compatibility, implement dolfinx.fem.Function.vector
as
@property
def vector(self):
return self.x.vector()
I can create a simple PR for this, if you like the proposal.
The current design is consistent with the behaviour of petsc4py; you need to manually manage memory if a DOLFINx routine returns a petsc4py object. This also applies in the C++ interface (for PETSc objects) see e.g. https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/la/petsc.h#L49.
We should improve our documentation to make it clear that this is the callers responsibility.
Is there a specific motivation for using the petsc vector over the DOLFINx vector when working with DOLFINx? The DOLFINx one has clear memory management and easy access to ghost data.
@uvilla We could add an option to not duplicate the communicator, or perhaps better we could attach a new 'library' communicator.
It's a poor design to use a user communicator in a library because the user or the library could change the communicator or do something on the communicator that affects the other (see Eijkhout, p. 204). The obvious options to deal with this are:
On petsc4py
, the only robust solution is calling destroy
explicitly.
@garth-wells @jhale and @jorgensd :
Thank you very much for the clear explanation and discussion. Also thank you for kindly explaining to me memory management in petsc4py
.
I do agree with the concept of a library communicator to isolate communications in dolfinx
from third-party libraries. However, I also see how the one-communicator-per-object approach may get in the way of many uncertainty-quantification workflows.
A happy compromise would probably be to duplicate the communicator at the level of the mesh
object (where the mesh partitioning defines the topology of the neighbors) or at the level of the FunctionSpace
(which defines the communication pattern for the FE assembly).
I wonder if having the FunctionSpace
, rather than the Function
object build the scatterer
object (which holds the MPI_Comm
) may be a good compromise between robustness and avoiding too many duplications of the communicator.
Since fenicsx
uses smart pointers there should not be any issues with inadvertently destroying the shared communicator/scatterer.
I'll be happy to contribute if anything I am proposing is compatible with your vision on how dolfinx
should behave.
See #3065.
I made a small change in #3085 that avoid communicator duplication for Vectors
that are on a one-rank communicator. Of the size of the communicator is one, the Scatterer
doesn't create any communicators.
os.environ['OMP_NUM_THREADS'] = '1'
import dolfinx.mesh
import dolfinx.fem
import dolfinx.io
from mpi4py import MPI
import gc
comm = MPI.COMM_WORLD
for i in range(3000):
print(f"{comm.rank}, {i}")
comm_cp = MPI.COMM_WORLD.Dup()
msh = dolfinx.mesh.create_rectangle(comm_cp,
points=((0.0, 0.0), (1, 1)),
n=(int(4), int(4)),
cell_type=dolfinx.mesh.CellType.triangle, # quadrilateral triangle
ghost_mode=dolfinx.mesh.GhostMode.shared_facet
)
msh.comm.Free()
del msh
gc.collect()
I am doing a simulation which need remesh for every few time steps. But the above codes fail after 2035 steps on 2 processes because of too many communicator. Is there any way to free the communicators?
os.environ['OMP_NUM_THREADS'] = '1' import dolfinx.mesh import dolfinx.fem import dolfinx.io from mpi4py import MPI import gc comm = MPI.COMM_WORLD for i in range(3000): print(f"{comm.rank}, {i}") comm_cp = MPI.COMM_WORLD.Dup() msh = dolfinx.mesh.create_rectangle(comm_cp, points=((0.0, 0.0), (1, 1)), n=(int(4), int(4)), cell_type=dolfinx.mesh.CellType.triangle, # quadrilateral triangle ghost_mode=dolfinx.mesh.GhostMode.shared_facet ) msh.comm.Free() del msh gc.collect()
I am doing a simulation which need remesh for every few time steps. But the above codes fail after 2035 steps on 2 processes because of too many communicator. Is there any way to free the communicators?
Hi, I am facing exactly the same problem. I need to create a series of meshes in my loop. Then this error appears. I think something goes wrong with the creation of the communicator when the mesh is built. Is there any possibility to release them since I won't use the data of the old meshes in new time steps.
Hi there,
I know this issue has already been discussed in several threads, but I would like to understand better the rationale behind using
MPI_Comm_dup
when wrapping the MPI comm in dolfinx.With a student of mine, we are finally (it was overdue) porting hIPPYlib (https://hippylib.github.io/) to
dolfinx
, and we are quickly facing the Too many communicators issue.Our code does not store any long list of
dlx.fem.Function
but we do create quite a lot of temporarydlx.fem.Function
anddlx.fem.form
instances in our code. All of these should be destroyed when the function returns, but it seems that is not the case for us.While this may be a problem of the garbage collection in python, it is still an issue that affects the usability of the library.
So my questions are:
Originally posted by @uvilla in https://github.com/FEniCS/dolfinx/issues/2308#issuecomment-1954583103