Closed nate-sime closed 8 months ago
Same issue on main with 3 processes and:
import numpy as np
from mpi4py import MPI
import dolfinx
mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, 2, 2)
V = dolfinx.fem.functionspace(mesh, ("DG", 1, (2, )))
u_vec = dolfinx.fem.Function(V, name="u")
u_vec.interpolate(lambda x: np.stack((x[0], x[1])))
u_vec.x.scatter_forward()
u1 = u_vec.sub(0).collapse()
u1.x.scatter_forward()
u1.name = "u1"
# Fails in parallel with odd ghosting issues
with dolfinx.io.VTXWriter(
mesh.comm, "solution.bp", [u_vec, u1], engine="BP4") as vtx:
vtx.write(0.0)
# Works fine
with dolfinx.io.VTXWriter(
mesh.comm, "solution_u1.bp", [u1], engine="BP4") as vtx:
vtx.write(0.0)
Seems like a switch in cells:Screenshot from 2023-12-13 08-53-46
Also fails with 3 processes and Lagrange:
import numpy as np
from mpi4py import MPI
import dolfinx
mesh = dolfinx.mesh.create_unit_square(
MPI.COMM_WORLD, 2, 2, ghost_mode=dolfinx.mesh.GhostMode.none)
V = dolfinx.fem.functionspace(mesh, ("Lagrange", 1, (2, )))
u_vec = dolfinx.fem.Function(V, name="u")
u_vec.interpolate(lambda x: np.stack((x[0], x[1])))
u_vec.x.scatter_forward()
u1 = u_vec.sub(0).collapse()
u1.x.scatter_forward()
u1.name = "u1"
# Fails in parallel with odd ghosting issues
with dolfinx.io.VTXWriter(
mesh.comm, "solution.bp", [u_vec, u1], engine="BP4") as vtx:
vtx.write(0.0)
# Works fine
with dolfinx.io.VTXWriter(
mesh.comm, "solution_u1.bp", [u1], engine="BP4") as vtx:
vtx.write(0.0)
To me it looks like it relates to the fact that the ghosts are ordered differently in the collapsed space. Is this on purpose, @garth-wells @chrisrichardson @jpdean any comments?
import numpy as np
from mpi4py import MPI
import dolfinx
mesh = dolfinx.mesh.create_unit_square(
MPI.COMM_WORLD, 2, 2, ghost_mode=dolfinx.mesh.GhostMode.none)
V = dolfinx.fem.functionspace(mesh, ("Lagrange", 1, (2, )))
V0, _ = V.sub(0).collapse()
print(V.dofmap.index_map.ghosts,
V0.dofmap.index_map.ghosts)
[1] [1]
[5 7 6 0] [0 5 7 6]
[6] [6]
To me it looks like it relates to the fact that the ghosts are ordered differently in the collapsed space. Is this on purpose, @garth-wells @chrisrichardson @jpdean any comments? From @jorgensd
I think we agreed in https://github.com/FEniCS/dolfinx/issues/2881 that that was on purpose.
To me it looks like it relates to the fact that the ghosts are ordered differently in the collapsed space. Is this on purpose, @garth-wells @chrisrichardson @jpdean any comments? From @jorgensd
I think we agreed in #2881 that that was on purpose.
I agree that in the case of using create_sub_map
, it doesn't make sense to have that special handling.
However, I find it strange that collapsing a function space reorders the dofs, when we explicitly do not want to re-order all indices when having such a submap.
https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/DofMap.cpp#L26-L83
The function states that it doesn't reorder the dofmap, however, that isn't true, as the ghost indices are reordered.
To me, this suggests that calling create_sub_map
is way overkill for collapsing a blocked function space (as you then retain all ghost degrees of freedom). I guess whenever you have proper mixed spaces, it will be tricker to find the subset of indices and ghosts, and some reordering would be expected. I still wonder if we even in that case could do this without any communication, as you know that you are retaining all dofs local in the sub-space, similarly all ghosts are kept (in the subspace).
It's not clear to me that the ghost ordering is the problem. Re-mapping takes place when copying entries from the vector at https://github.com/FEniCS/dolfinx/blob/227f1d3cdd52a7b08a5b01c37f395ecb0d9fdb7d/cpp/dolfinx/fem/Function.h#L126.
As the issue shows, they are copied over in the correct order, ie, if you only write the collapsed function to file, the ordering is correct. However, if you write the parent space and collapsed sub space to the same file, we see that the dof maps are not the same, even if the Finite elements are the same (up to block size) and the dof layout is the same for both spaces.
It's not clear to me that the ghost ordering is the problem. Re-mapping takes place when copying entries from the vector at
.
You can also see this with the PR that I have made, as it resolves the issues described here
The root problem still hasn't been properly identified. Is it that VTXWriter
requires all functions to have the same dofmap, and in the MWE the maps are not the same?
The root problem still hasn't been properly identified. Is it that
VTXWriter
requires all functions to have the same dofmap, and in the MWE the maps are not the same?
Yes, VTXWriter assumes that the dofmaps are the same (as there is a 1-1 map between the mesh geometry dofs and the function dofs)
This was previously the case for blocked spaces, but changed whenever create_submap
was introduced in dofmap.collapse()`.
there are thee ways of «fixing» this:
Option 1: change create submap as Ive done in #2930
Option 2: Do not allow sub spaces of blocked spaces in the same VTXWriter (has the drawbacl that one would have to save the mesh N times instead of 1).
Option 3: remove create_submap
from dofmap.collapse and go back to a less general implementation that keeps the order of the ghosts.
The documentation specifies that the elements for the different functions must be the same, whereas the implementation requires that the element and the dofmap are the same. Immediate issues to be fixed are:
VTXWriter
if the elements and/or dofmaps are not the same.A second step could be:
It is best if this class imposes as few requirements as reasonably possible on the arguments.
- Support cases with the same element but different dofmaps.
This means writing the mesh twice, or compute the permutation/mapping from one dofmap to the other, which I am not sure is worthwhile complexity.
i still think this is something that should be fixed collapsed function space code. It is silly that we keep the same numbering for the local dofs, but not the ghosts.
Summarize the issue
Using VTXWriter to output a DG function and one of its collapsed sub functions yields incorrect data in parallel.
How to reproduce the bug
The following MWE and attached paraview rendering illustrates the issue. The image is produced after running with 5 MPI processes.
Minimal Example (Python)
Output (Python)
No response
Version
0.7.2
DOLFINx git commit
No response
Installation
From source
Additional information