FEniCS / dolfinx

Next generation FEniCS problem solving environment
https://fenicsproject.org
GNU Lesser General Public License v3.0
794 stars 182 forks source link

[BUG]: Interior facet integrals give differing values in serial and in parallel #3501

Closed bpachev closed 3 weeks ago

bpachev commented 3 weeks ago

Summarize the issue

When assembling a vector from an interior facet integral, the vector sum and norm change with the number of MPI processes. While mesh partitioning does affect the order of degrees of freedom, it shouldn't change these aggregate metrics.

How to reproduce the bug

Within the Docker image dolfinx/dolfinx:stable, I ran the MWE in parallel with varying numbers of processes

mpiexec -np 1 python3 mwe_interior_facet_bug.py
mpiexec -np 2 python3 mwe_interior_facet_bug.py
mpiexec -np 3 python3 mwe_interior_facet_bug.py

Minimal Example (Python)

from mpi4py import MPI
from dolfinx import mesh, fem as fe
from dolfinx.fem import petsc as fe_petsc
from petsc4py import PETSc
import ufl
import numpy as np

domain = mesh.create_unit_square(
    MPI.COMM_WORLD, 19, 27, mesh.CellType.triangle
)

V = fe.functionspace(domain, ("P", 1))
V_dg = fe.functionspace(domain, ("DG", 1))
p = ufl.TestFunction(V)
p_dg = ufl.TestFunction(V_dg)
n = ufl.FacetNormal(domain)
u_dg = fe.Function(V_dg)
u_dg.interpolate(lambda x: x[0]**2 + x[1])
kappa = fe.Function(V)
kappa.interpolate(lambda x: np.sin(x[0])*np.cos(x[1]))
L = ufl.avg(p_dg) * ufl.avg(kappa) * ufl.dot(ufl.avg(ufl.grad(u_dg)), n("+")) * ufl.dS
L = fe.form(L)

b = fe_petsc.create_vector(L)
b.zeroEntries()

fe_petsc.assemble_vector(b, L)
b.ghostUpdate(addv=PETSc.InsertMode.ADD, mode=PETSc.ScatterMode.REVERSE)

# now compute sum across all entries
# show prints for all ranks to prove that the values are global, not local
print(f"With {MPI.COMM_WORLD.size} processes, sum={b.sum()}, norm={b.norm(PETSc.NormType.NORM_2)}")

Output (Python)

With 1 processes, sum=1.9735490402112, norm=0.2900086489812219
With 2 processes, sum=-1.1928489757529805, norm=0.2906011302255919
With 2 processes, sum=-1.1928489757529805, norm=0.2906011302255919
With 3 processes, sum=2.753330018608379, norm=0.3319789413877082
With 3 processes, sum=2.753330018608379, norm=0.3319789413877082
With 3 processes, sum=2.753330018608379, norm=0.3319789413877082

Version

0.9.0

DOLFINx git commit

4116cca8cf91f1a7e877d38039519349b3e08620

Installation

Used the docker image dolfinx/dolfinx:stable

Additional information

Manual inspection of some of the vector entries indicates that the absolute value is correct, but the sign is flipped. However, as the mismatch in norm values indicates, that's not the only issue. This issue doesn't appear for exterior facet or cell integral types. I suspect the issue may have something to do with how the ordering of cells corresponding to a given facet changes in parallel, but am not sure yet.

francesco-ballarin commented 3 weeks ago

You need to change the ghost_mode to shared_facet, see for instance https://jsdokken.com/dolfinx_docs/meshes.html#ghosting or grep in this repository for unit tests.

Something like

mesh = dolfinx.mesh.create_unit_square(
    MPI.COMM_WORLD, 10, 10, ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

will fix the issue.

bpachev commented 3 weeks ago

Setting the ghost mode to shared facet doesn't change the results of the above script. In fact, shared_facet is the default ghost mode for create_unit_square, as listed in the docs

francesco-ballarin commented 3 weeks ago

Well, then you need something similar to https://fenicsproject.discourse.group/t/consistent-side-of-interior-boundaries-with-dolfinx/15968/2 .

garth-wells commented 3 weeks ago

The n("+") term looks fishy to me. There are no guarantees on which side is "+" and which side is "-". DG schemes are normally written such that it doesn't matter which side is +/-.

bpachev commented 3 weeks ago

So in light of that, we're OK if the results of an integral involving an n("+") term change with the number of MPI processes? I guess that makes sense given this wouldn't affect an actual DG scheme (which this is not). Which would make this not a bug, because we give no guarantees about the choice between +/- or that choice remaining stable across different numbers of processes.

Might be worth including some sort of warning about this in the docs.

jorgensd commented 3 weeks ago

So in light of that, we're OK if the results of an integral involving an n("+") term change with the number of MPI processes? I guess that makes sense given this wouldn't affect an actual DG scheme (which this is not). Which would make this not a bug, because we give no guarantees about the choice between +/- or that choice remaining stable across different numbers of processes.

Might be worth including some sort of warning about this in the docs.

I have written up a demo that illustrates how to do custom orientation of jumps at: https://github.com/jorgensd/dolfinx-tutorial/issues/158 https://fenicsproject.discourse.group/t/wrong-facetnormal-vector-on-internal-boundaries/12887/2 in the case of one sided integration, and for jumps at:
https://gist.github.com/jorgensd/5a8b60121a705bb9896eb1465be8e111 (Here in the case of a bifurcation mesh, but it does work in general).