too many MPI_COMM duplications while creating a Function?

francesco-ballarin commented 2 years ago

Hi, while working on a time-dependent problem with @AnnaSanfilippo we encountered some segmentation faults that might be related to duplication of MPI communicators.

I have slightly changed the poisson demo to reproduce this.

import petsc4py.PETSc
opt = petsc4py.PETSc.Options()
opt["on_error_attach_debugger"] = None

import numpy as np

import ufl
from dolfinx import fem, io, mesh, plot
from ufl import ds, dx, grad, inner

import dolfinx
from mpi4py import MPI
from petsc4py.PETSc import ScalarType

msh = mesh.create_rectangle(comm=MPI.COMM_WORLD,
                            points=((0.0, 0.0), (2.0, 1.0)), n=(32, 16),
                            cell_type=mesh.CellType.triangle,)
V = fem.FunctionSpace(msh, ("Lagrange", 1))

facets = mesh.locate_entities_boundary(msh, dim=1,
                                       marker=lambda x: np.logical_or(np.isclose(x[0], 0.0),
                                                                      np.isclose(x[0], 2.0)))

dofs = fem.locate_dofs_topological(V=V, entity_dim=1, entities=facets)

bc = fem.dirichletbc(value=ScalarType(0), dofs=dofs, V=V)

u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
x = ufl.SpatialCoordinate(msh)
f = 10 * ufl.exp(-((x[0] - 0.5) ** 2 + (x[1] - 0.5) ** 2) / 0.02)
g = ufl.sin(5 * x[0])
a = inner(grad(u), grad(v)) * dx
L = inner(f, v) * dx + inner(g, v) * ds

sols = list()
for i in range(17500*4):
    print(i)
    a_cpp = dolfinx.fem.form(a)
    f_cpp = dolfinx.fem.form(L)
    A = dolfinx.fem.petsc.assemble_matrix(a_cpp, bcs=[bc])
    A.assemble()
    F = dolfinx.fem.petsc.assemble_vector(f_cpp)
    F.ghostUpdate(addv=petsc4py.PETSc.InsertMode.ADD, mode=petsc4py.PETSc.ScatterMode.REVERSE)
    dolfinx.fem.petsc.set_bc(F, [bc])
    ksp = petsc4py.PETSc.KSP()
    ksp.create(msh.comm)
    ksp.setOperators(A)
    ksp.setType("preonly")
    ksp.getPC().setType("lu")
    ksp.getPC().setFactorSolverType("mumps")
    ksp.setFromOptions()
    solution = dolfinx.fem.Function(V)
    ksp.solve(F, solution.vector)
    #sols.append(solution)

Notice that the last line is commented out:

on my laptop (openmpi 4.1.4) the code with the last line commented out runs correctly for 70 000 iterations
on the dolfinx docker image (mpich 4.0.2) the code with the last line commented out runs correctly for 70 000 iterations

When uncommenting the last line:

on my laptop (openmpi 4.1.4) the code with the last uncommented runs for 32 755 (= 2^15) iterations, and then gives the error

[pascal:2722704] *** An error occurred in MPI_Iprobe
[pascal:2722704] *** reported by process [58785793,0]
[pascal:2722704] *** on communicator MPI_COMM_NULL
[pascal:2722704] *** MPI_ERR_COMM: invalid communicator
[pascal:2722704] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[pascal:2722704] ***    and potentially your MPI job)

on the dolfinx docker image (mpich 4.0.2) the code with the last uncommented runs for 1016 (=2^10) iterations, and then terminates with a segfault.

Why would storing many dolfinx.fem.Function cause MPI-related segfaults?

jorgensd commented 2 years ago

A DOLFINx Function contains a la::Vector that has it's own scatterer (to manage MPI communication). This scatterer has a neighbourhood communicator.

@IgorBaratta, do you have any idea of how we could rewrite this.

In general, it is not advisable to create dolfinx.fem.Function, petsc matrices/vectors in loops and storing all of them, as this increases the memory usage of the problem.

Is there a specific reason for wanting to store the function at every time step in your more complex application? What about outputting them using VTXWriter instead?

garth-wells commented 2 years ago

Some background: it is not good practice to 'share' MPI communicators since (1) there is no guarantee that someone holding the communicator (which could be an external library) won't modify the communicator, and (2) things can go wrong if two parts of a code attempt non-blocking communication on the same communicator. This is why we duplicate communicators. A different approach, which I believe PETSc uses, is to attach new communicators to the communicator that is passed to a constructor.

A 'solution' to the reported issue would be to support 'multi-vectors', and by extension 'multi-functions'. We have discussed supporting 'multi-vectors', see https://github.com/FEniCS/dolfinx/issues/1857.

francesco-ballarin commented 2 years ago

In general, it is not advisable to create dolfinx.fem.Function, petsc matrices/vectors in loops and storing all of them, as this increases the memory usage of the problem.

Is there a specific reason for wanting to store the function at every time step in your more complex application? What about outputting them using VTXWriter instead?

In the context of reduced order modelling and/or machine learning, there is often no sensible way to work around this, at least in the most basic approaches. You need to create a dataset of the solution field over time (and/or parameter), and thus you need to have all of them loaded in the memory at the same time.

jorgensd commented 2 years ago

In general, it is not advisable to create dolfinx.fem.Function, petsc matrices/vectors in loops and storing all of them, as this increases the memory usage of the problem. Is there a specific reason for wanting to store the function at every time step in your more complex application? What about outputting them using VTXWriter instead?

In the context of reduced order modelling and/or machine learning, there is often no sensible way to work around this, at least in the most basic approaches. You need to create a dataset of the solution field over time (and/or parameter), and thus you need to have all of them loaded in the memory at the same time.

You can store the pure numpy arrays local to each process instead, and simply use a single function with a single communicator.

francesco-ballarin commented 2 years ago

You can store the pure numpy arrays local to each process instead, and simply use a single function with a single communicator.

Sure, we already doing that in the meantime, but IMHO that is a workaround, not a solution.

IgorBaratta commented 2 years ago

Minimal reproducible example:

import dolfinx
from mpi4py import MPI

comm=MPI.COMM_WORLD
msh = dolfinx.mesh.create_interval(comm, 10, points=(0.0, 1.0))
V = dolfinx.fem.FunctionSpace(msh, ("Lagrange", 1))

sols = list()
for i in range(2048):
    print(i)
    u = dolfinx.fem.Function(V)
    sols.append(u)

Output:

Other MPI error, error stack:
internal_Dist_graph_create_adjacent(125).: MPI_Dist_graph_create_adjacent(comm=0x84000006, indegree=0, sources=(nil), sourceweights=0x7f3411b9ea74, outdegree=0, destinations=(nil), destweights=0x7f3411b9ea74, MPI_INFO_NULL, reorder=0, comm_dist_graph=0x7fffd29c2d18) failed
MPIR_Dist_graph_create_adjacent_impl(325): 
MPII_Comm_copy(830)......................: 
MPIR_Get_contextid_sparse_group(591).....: Too many communicators (0/2048 free on this process; ignore_id=0)
Abort(403268367) on node 0 (rank 0 in comm 416): application called MPI_Abort(comm=0x84000006, 403268367) - process 0

As @jorgensd mentioned, the issue comes from the fact that when we call dolfinx.fem.Function(V) the default constructor of the la::Vector is called, creating a new scatterer which creates its own neighbor communicator.

However instead of creating a new function with dolfinx.fem.Function(V) one can call function.copy(). In this case the scatterer is shared between functions (and their respective vectors).

import dolfinx
from mpi4py import MPI

comm=MPI.COMM_WORLD
msh = dolfinx.mesh.create_interval(comm, 10, points=(0.0, 1.0))
V = dolfinx.fem.FunctionSpace(msh, ("Lagrange", 1))

sols = list()
u = dolfinx.fem.Function(V)
for i in range(2048):
    print(i)
    un = u.copy()
    sols.append(un)

@francesco-ballarin , do you think this is an acceptable solution? Alternatively, we can expose a more verbose way of creating functions in python.

francesco-ballarin commented 2 years ago

Thanks @IgorBaratta. It would be great it there were a way (even more verbose) to create functions that could account for this case. I can try to use the .copy() as a workaround, but if there was a long term fix that would be great!

IgorBaratta commented 2 years ago

I think a more verbose alternative could be:


V = dolfinx.fem.FunctionSpace(msh, ("Lagrange", 1))

map = V.dofmap.index_map
scatterer = dolfinx.common.Scatterer(map, bs) # add this wrapper

for i in range(2048):
     x = dolfinx.la.vector(scatterer)   # create a new constructor
     solution = dolfinx.fem.Function(V, x=x, dtype=np.float64)

The scatterer is explicitly created and then passed to the vector ...

But this is basically what the .copy() function is doing under the hood.

francesco-ballarin commented 2 years ago

I am closing this since we are happy with using dolfinx.fem.Function.copy(), and thus we will not need the more verbose alternative after all. Thanks @IgorBaratta for the suggestion!

francesco-ballarin commented 10 months ago

Just to report that we had a user in https://fenicsproject.discourse.group/t/error-with-mpi-in-transient-simulation/13182 encountering this.

In their case, they cannot really use the workaround with dolfinx.fem.Function.copy() because the mesh changes at every iteration. The minimal example corresponding to their case would be along the lines of

import mpi4py.MPI
import dolfinx

sols = list()
for i in range(2**17):
    print(i)
    msh = dolfinx.mesh.create_unit_interval(mpi4py.MPI.COMM_WORLD, 10)
    V = dolfinx.fem.functionspace(msh, ("Lagrange", 1))
    u = dolfinx.fem.Function(V)
    sols.append(u)

where the definition of msh inside the for loop is to mimick the fact that the mesh is changing.

I am not reopening this because I think that this use case is very rare, and not worth of over complicating the interface to tackle this. Still, I am reporting this here for reference to future users/developers, or if anyone among the developers has differing opinion about this issue needing to be reopened.

mikesha2 commented 9 months ago

I had a similar problem, where I was loading ~60k saved meshes, interpolating a Function on each mesh, and saving the values. At first, I tried manually finding all the relevant MPI.Comm's, and calling .Free() on each of them. I definitely made progress this way (my script was able to process maybe 4 times as many elements before crashing), but then I stumbled onto this solution (roughly), which lets me load all 60k meshes and function data:

...
V = dolfinx.fem.functionspace(...)
u = Function(V)
temp = np.load(fn)
u.x.array[:] = temp
...

If instead I perform the load on a single line:

u.x.array[:] = np.load(fn)

I get the "too many communicators" error. Hope this is helpful to anyone facing this issue.

uvilla commented 8 months ago

Hi there,

I know this issue has already been discussed in several threads, but I would like to understand better the rationale behind using MPI_Comm_dup when wrapping the MPI comm in dolfinx.

With a student of mine, we are finally (it was overdue) porting hIPPYlib (https://hippylib.github.io/) to dolfinx, and we are quickly facing the Too many communicators issue.

Our code does not store any long list of dlx.fem.Function but we do create quite a lot of temporary dlx.fem.Function and dlx.fem.form instances in our code. All of these should be destroyed when the function returns, but it seems that is not the case for us.

While this may be a problem of the garbage collection in python, it is still an issue that affects the usability of the library.

So my questions are:

Is duplication of the MPI Comm truly needed?
Could an option of not to duplicate the communicator be exposed to the final user, without having to recompile the library?

FEniCS / dolfinx

too many MPI_COMM duplications while creating a Function? #2308