firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
510 stars 159 forks source link

Seg fault in TransferManager.restrict() for NonNestedHierarchy / bug in petsc multigrid #2156

Open bensworth opened 3 years ago

bensworth commented 3 years ago

This is two related seg-faults/errors related to a NonNestedHierarchy and firedrake multigrid. I am trying to restrict a residual from one FEM mesh to a coarser, non-nested mesh, but get the seg fault

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
application called MPI_Abort(MPI_COMM_WORLD, 50176059) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=50176059

The problem seems unique to NonNestedHierarchy, as if I replace this with a MeshHierarchy in otherwise the same code, everything runs fine. Tracking through pdb, The seg fault seems to arises in

transfer = TransferManager(use_averaging=True)
transfer.restrict(u, vc)
-> return self._native_transfer(source_element, Op.RESTRICT)(gf, gc)
--> return self.native_transfers.setdefault(element, ops)[op]
---> @PETSc.Log.EventDecorator()
----> fine_to_coarse = utils.fine_node_to_coarse_node_map(Vf, Vc)
-----> fine_to_coarse_nodes = impl.fine_to_coarse_nodes(Vf, Vc, fine_to_coarse)
------> def cell_node_map(self)

A separate issue: in debugging this, @jandrej found a bug in the petsc multigrid interface as well. If we define both meshes at the top of the script, before calling solve (before, I was defining a coarse mesh after calling solve using Petsc multigrid), e.g.,

mesh = UnitSquareMesh(nx, ny)
cmesh = UnitSquareMesh(ncx, ncy)
hierarchy = NonNestedHierarchy(cmesh, mesh)       # This does not
# hierarchy = MeshHierarchy(cmesh, 1)        # This works
mesh = hierarchy[-1]
cmesh = hierarchy[0]

Then we get the error Assertion failed: (!PyErr_Occurred()), function __Pyx_PyCFunction_FastCall, file src/petsc4py.PETSc.c, line 348918. @jandrej tracked this to be firedrake trying to pass information about the spaces to petsc that's failing. When he added PETSc.Sys.popErrorHandler() to the top of the script, we return to the seg fault mentioned above.

Attached is a MWE python script saved as a .txt because github won't allow you to attach .py files? firedrake_poisson_cg_debug.txt

wence- commented 3 years ago

This is two related seg-faults/errors related to a NonNestedHierarchy and firedrake multigrid. I am trying to restrict a residual from one FEM mesh to a coarser, non-nested mesh, but get the seg fault

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
application called MPI_Abort(MPI_COMM_WORLD, 50176059) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=50176059

The problem seems unique to NonNestedHierarchy, as if I replace this with a MeshHierarchy in otherwise the same code, everything runs fine. Tracking through pdb, The seg fault seems to arises in

transfer = TransferManager(use_averaging=True)
transfer.restrict(u, vc)
-> return self._native_transfer(source_element, Op.RESTRICT)(gf, gc)
--> return self.native_transfers.setdefault(element, ops)[op]
---> @PETSc.Log.EventDecorator()
----> fine_to_coarse = utils.fine_node_to_coarse_node_map(Vf, Vc)
-----> fine_to_coarse_nodes = impl.fine_to_coarse_nodes(Vf, Vc, fine_to_coarse)
------> def cell_node_map(self)
  • I seemed to step directly from fine_to_coarse_nodes to cell_node_map
    • Stepping through, it seems to loop over this function, successfully a few times, and then seg fault at
          /Users/southworth/Software/firedrake/src/PyOP2/pyop2/utils.py(63)__get__()->array([[  0, ..., dtype=int32)
          -> return result

The seg fault occurs for use_averaging=True or False and even if I set the meshes to be identical, e.g., UnitSquareMesh.

A separate issue: in debugging this, @jandrej found a bug in the petsc multigrid interface as well. If we define both meshes at the top of the script, before calling solve (before, I was defining a coarse mesh after calling solve using Petsc multigrid), e.g.,

mesh = UnitSquareMesh(nx, ny)
cmesh = UnitSquareMesh(ncx, ncy)
hierarchy = NonNestedHierarchy(cmesh, mesh)       # This does not
# hierarchy = MeshHierarchy(cmesh, 1)        # This works
mesh = hierarchy[-1]
cmesh = hierarchy[0]

Then we get the error Assertion failed: (!PyErr_Occurred()), function __Pyx_PyCFunction_FastCall, file src/petsc4py.PETSc.c, line 348918. @jandrej tracked this to be firedrake trying to pass information about the spaces to petsc that's failing. When he added PETSc.Sys.popErrorHandler() to the top of the script, we return to the seg fault mentioned above.

If I do this then I get the following error:

    self.op(uf, uc, transfer_op=Op.INJECT)
  File "PETSc/Log.pyx", line 151, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "PETSc/Log.pyx", line 152, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "/Users/vtdb72/Documents/work/src/firedrake/src/firedrake/firedrake/mg/embedded.py", line 229, in op
    return self._native_transfer(source_element, transfer_op)(source, target)
  File "PETSc/Log.pyx", line 151, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "PETSc/Log.pyx", line 152, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "/Users/vtdb72/Documents/work/src/firedrake/src/firedrake/firedrake/mg/interface.py", line 181, in inject
    kernel, dg = kernels.inject_kernel(Vf, Vc)
  File "/Users/vtdb72/Documents/work/src/firedrake/src/firedrake/firedrake/mg/kernels.py", line 403, in inject_kernel
    ncandidate = hierarchy.coarse_to_fine_cells[level].shape[1] * level_ratio
AttributeError: 'NoneType' object has no attribute 'shape'

In the nonnestedmeshhierarchy we don't have any a-priori relationship between the cells.

It turns out a lot of the multigrid code assumes there is such a relationship.

In theory one could abuse the supermesh stuff we have to provide the oracle mapping from coarse to fine cells in this case. It needs a bit of work. Part of the problem is that I've kind of hard-coded an assumption that in an MG hierarchy each coarse cell overlaps with the same number of fine cells. Clearly this isn't necessarily the case for arbitrary non-nested hierarchies.

If all you actually want to do is to transfer primal quantities between non-matching meshes then you can do this with the supermesh projection that is implemented. In this case you need to not make a mesh hierarchy and do:

cmesh = ...
mesh = ...

u = Function(V_on_mesh)
uc = Function(V_on_cmesh)

project(u, uc) # or vice-versa. This does an L2 projection using the supermesh to compute the integrals.

tl;dr: it turns out not as much works right now as hoped.

Attached is a MWE python script saved as a .txt because github won't allow you to attach .py files? firedrake_poisson_cg_debug.txt

Thank