Point evaluation failure in parallel

jrmaddison commented 1 year ago

The following example

from firedrake import *

N_x, N_y, N_z = (2, 2, 2)

mesh = UnitCubeMesh(N_x, N_y, N_z)
X = SpatialCoordinate(mesh)
space = FunctionSpace(mesh, "Lagrange", 1)

F = Function(space, name="F")
F.interpolate(exp(X[0]) * exp(X[1]))

F((0.2, 0.3, 0.4))

leads to a failure when run on 4 processes

Traceback (most recent call last):
  File "[...]/test.py", line 12, in <module>
Traceback (most recent call last):
  File "[...]/test.py", line 12, in <module>
Traceback (most recent call last):
  File "[...]/test.py", line 12, in <module>
Traceback (most recent call last):
    F((0.2, 0.3, 0.4))
    F((0.2, 0.3, 0.4))
  File "[...]/build/firedrake/firedrake/src/ufl/ufl/exproperators.py", line 330, in _call
  File "[...]/test.py", line 12, in <module>
  File "[...]/build/firedrake/firedrake/src/ufl/ufl/exproperators.py", line 330, in _call
    F((0.2, 0.3, 0.4))
  File "[...]/build/firedrake/firedrake/src/ufl/ufl/exproperators.py", line 330, in _call
    return _eval(self, arg, mapping, component)
    return _eval(self, arg, mapping, component)
  File "[...]/build/firedrake/firedrake/src/ufl/ufl/exproperators.py", line 320, in _eval
  File "[...]/build/firedrake/firedrake/src/ufl/ufl/exproperators.py", line 320, in _eval
    return _eval(self, arg, mapping, component)
  File "[...]/build/firedrake/firedrake/src/ufl/ufl/exproperators.py", line 320, in _eval
    return f.evaluate(coord, mapping, component, index_values)
    return f.evaluate(coord, mapping, component, index_values)
  File "[...]/build/firedrake/firedrake/src/firedrake/firedrake/function.py", line 488, in evaluate
  File "[...]/build/firedrake/firedrake/src/firedrake/firedrake/function.py", line 488, in evaluate
    return f.evaluate(coord, mapping, component, index_values)
  File "[...]/build/firedrake/firedrake/src/firedrake/firedrake/function.py", line 488, in evaluate
    return self.at(coord)
  File "PETSc/Log.pyx", line 115, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
    return self.at(coord)
  File "PETSc/Log.pyx", line 115, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "PETSc/Log.pyx", line 116, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "PETSc/Log.pyx", line 116, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "[...]/build/firedrake/firedrake/src/firedrake/firedrake/function.py", line 600, in at
  File "[...]/build/firedrake/firedrake/src/firedrake/firedrake/function.py", line 600, in at
    return self.at(coord)
    raise RuntimeError("Point evaluation gave different results across processes.")
  File "PETSc/Log.pyx", line 115, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
    raise RuntimeError("Point evaluation gave different results across processes.")
RuntimeError: Point evaluation gave different results across processes.
application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 3
RuntimeError: Point evaluation gave different results across processes.
application called MPI_Abort(PYOP2_COMM_WORLD, 1) - process 2
  File "PETSc/Log.pyx", line 116, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func

There is no error if the expression is changed to

F.interpolate(X[0] + X[1])

suggesting that one or more processes is extrapolating outside of its partition.

jrmaddison commented 1 year ago

A serial example, where evaluation at an out-of-bounds point returns a value.

from firedrake import *

mesh = UnitTriangleMesh()
space = FunctionSpace(mesh, "Lagrange", 1)
X = SpatialCoordinate(mesh)

print(interpolate(X, VectorFunctionSpace(mesh, "Lagrange", 1)).dat.data_ro)

F = Function(space, name="F")
F.interpolate(X[1])
print(F((0.9, 0.9)))  # Does not raise an error

jrmaddison commented 1 year ago

I think this originates from #2662. Since libspatialindex indexes items by their bounding box, the update finds elements whose bounding box contains the evaluation point, even if the point lies within the partition for a different process.

dham commented 1 year ago

This is clearly a bug, and it's quite possibly a result of the FIAT change you point to, however that change is not wrong in and of itself. The change among other things switches to default tolerances for being outside cells which are relative to cell size. This is the right thing to do in order to deal with discretisation error at domain boundaries (and you can override it by user argument if you feel the need). The changes you found are also needed to prevent points getting lost in the boundaries between cells, which has been an ongoing problem.

I think what is going on here is that @ReubenHill has fixed the parallel implications of this via a voting algorithm for VertexOnlyMesh, but this has not been applied to the legacy at functionality. I think the right answer is probably to modify at so that it just creates at throw-away VertexOnlyMesh. Can you verify if the same parallel issue occurs using a one-point VertexOnlyMesh?

jrmaddison commented 1 year ago

Can you verify if the same parallel issue occurs using a one-point VertexOnlyMesh?

There is no error when using a VertexOnlyMesh, the following works.

from firedrake import *

import numpy as np

N_x, N_y, N_z = (2, 2, 2)

mesh = UnitCubeMesh(N_x, N_y, N_z)
vmesh = VertexOnlyMesh(mesh, np.array([[0.2, 0.3, 0.4]]))
X = SpatialCoordinate(mesh)
space = FunctionSpace(mesh, "Lagrange", 1)
vspace = FunctionSpace(vmesh, "Discontinuous Lagrange", 0)

F = Function(space, name="F")
F.interpolate(exp(X[0]) * exp(X[1]))

v = Function(vspace, name="value")
v.interpolate(F)

print(v.dat.data_ro)