Open nabw opened 2 years ago
A = assemble(a)
b = assemble(L)
solve(A, sol, b, solver_parameters={'ksp_monitor':None, 'pc_type': 'nn', 'mat_type':'is'})
The first thing I immediately notice is that assemble(a)
will give you back a matrix with a set mat_type
. You should try
A = assemble(a, mat_type="is")
to force the right one.
whoops, you are right. The error is now the same (thanks!):
Traceback (most recent call last):
File "demos/helmholtz/helmholtz2.py", line 13, in <module>
A = assemble(a, mat_type='is')
File "PETSc/Log.pyx", line 115, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
File "PETSc/Log.pyx", line 116, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
File "/home/barnafi/firedrake/src/firedrake/firedrake/adjoint/assembly.py", line 19, in wrapper
output = assemble(*args, **kwargs)
File "/home/barnafi/firedrake/src/firedrake/firedrake/assemble.py", line 101, in assemble
return _assemble_form(expr, *args, **kwargs)
File "/home/barnafi/firedrake/src/firedrake/firedrake/assemble.py", line 290, in _assemble_form
return assembler.assemble()
File "/home/barnafi/firedrake/src/firedrake/firedrake/assemble.py", line 406, in assemble
return self.result
File "/home/barnafi/firedrake/src/firedrake/firedrake/assemble.py", line 583, in result
self._tensor.M.assemble()
File "/home/barnafi/firedrake/src/PyOP2/pyop2/types/mat.py", line 896, in assemble
self.handle.assemble()
File "PETSc/Mat.pyx", line 1178, in petsc4py.PETSc.Mat.assemble
petsc4py.PETSc.Error: error code 73
[0] MatAssemblyBegin() at /home/barnafi/firedrake/src/petsc/src/mat/interface/matrix.c:5525
[0] MatAssemblyBegin_IS() at /home/barnafi/firedrake/src/petsc/src/mat/impls/is/matis.c:2744
[0] MatAssemblyBegin() at /home/barnafi/firedrake/src/petsc/src/mat/interface/matrix.c:5516
[0] Object is in wrong state
[0] Must call MatXXXSetPreallocation(), MatSetUp() or the matrix has not yet been factored on argument 1 "mat" before MatAssemblyBegin()
I've done a fair bit of digging and it looks like the matrix is put into an invalid state whenever PyOP2 executes a parloop. Specifically I've found that applying the following diff makes the error go away:
diff --git a/pyop2/parloop.py b/pyop2/parloop.py
index 8384268c..bd409b5a 100644
--- a/pyop2/parloop.py
+++ b/pyop2/parloop.py
@@ -238,13 +238,6 @@ class Parloop:
for m in pl_arg.data:
m.change_assembly_state(new_state)
pl_arg.data.change_assembly_state(new_state)
-
- if pl_arg.lgmaps is not None:
- olgmaps = []
- for m, lgmaps in zip(pl_arg.data, pl_arg.lgmaps):
- olgmaps.append(m.handle.getLGMap())
- m.handle.setLGMap(*lgmaps)
- orig_lgmaps.append(olgmaps)
return tuple(orig_lgmaps)
def restore_lgmaps(self, orig_lgmaps):
@@ -252,12 +245,6 @@ class Parloop:
if not self._has_mats:
return
- orig_lgmaps = list(orig_lgmaps)
- for arg, d in reversed(list(zip(self.global_kernel.arguments, self.arguments))):
- if isinstance(arg, (MatKernelArg, MixedMatKernelArg)) and d.lgmaps is not None:
- for m, lgmaps in zip(d.data, orig_lgmaps.pop()):
- m.handle.setLGMap(*lgmaps)
-
@cached_property
def _has_mats(self):
return any(isinstance(a, (MatParloopArg, MixedMatParloopArg)) for a in self.arguments)
Interestingly I found that just running the code below was enough to reproduce the error:
mat.handle.setLGMap(*mat.handle.getLGMap())
mat.handle.assemble()
Hope this helps.
Thanks, there is certainly something going on with the construction of these maps. It now runs in serial, and instead in parallel it gives me this:
[0] MatAssemblyEnd() at /home/barnafi/firedrake/src/petsc/src/mat/interface/matrix.c:5610
[0] MatAssemblyEnd_IS() at /home/barnafi/firedrake/src/petsc/src/mat/impls/is/matis.c:2754
[0] MatAssemblyEnd() at /home/barnafi/firedrake/src/petsc/src/mat/interface/matrix.c:5614
[0] MatAssemblyEnd_SeqAIJ() at /home/barnafi/firedrake/src/petsc/src/mat/impls/aij/seq/aij.c:1157
[0] Petsc has generated inconsistent data
[0] Unused space detected in matrix: 76 X 76, 86 unneeded
To avoid this, I tried commenting pyop2/types/mat.py:787 here:
# mat.setOption(mat.Option.UNUSED_NONZERO_LOCATION_ERR, True)
and instead now I get some missing entries:
[0] Object is in wrong state
[0] Matrix is missing diagonal entry 55
I suspect there is something weird going on with the ghost nodes. IS matrices handle a local matrix which is mapped to the global one through the LGmap, where this matrix includes all ghost dofs. To be more clear, if the problem has vertex-based dofs in the following mesh:
x ----- x ----- x | | | x ----- x ----- x
where each of the 2 processors has one (square) element, then each local matrix will be of size 4x4. I can try looking at this if you point me to where these maps are constructed. Also, thank you for your help!
The lgmaps are constructed here.
I've done a fair bit of digging and it looks like the matrix is put into an invalid state whenever PyOP2 executes a parloop. Specifically I've found that applying the following diff makes the error go away:
diff --git a/pyop2/parloop.py b/pyop2/parloop.py index 8384268c..bd409b5a 100644 --- a/pyop2/parloop.py +++ b/pyop2/parloop.py @@ -238,13 +238,6 @@ class Parloop: for m in pl_arg.data: m.change_assembly_state(new_state) pl_arg.data.change_assembly_state(new_state) - - if pl_arg.lgmaps is not None: - olgmaps = [] - for m, lgmaps in zip(pl_arg.data, pl_arg.lgmaps): - olgmaps.append(m.handle.getLGMap()) - m.handle.setLGMap(*lgmaps) - orig_lgmaps.append(olgmaps) return tuple(orig_lgmaps) def restore_lgmaps(self, orig_lgmaps): @@ -252,12 +245,6 @@ class Parloop: if not self._has_mats: return - orig_lgmaps = list(orig_lgmaps) - for arg, d in reversed(list(zip(self.global_kernel.arguments, self.arguments))): - if isinstance(arg, (MatKernelArg, MixedMatKernelArg)) and d.lgmaps is not None: - for m, lgmaps in zip(d.data, orig_lgmaps.pop()): - m.handle.setLGMap(*lgmaps) - @cached_property def _has_mats(self): return any(isinstance(a, (MatParloopArg, MixedMatParloopArg)) for a in self.arguments)
Interestingly I found that just running the code below was enough to reproduce the error:
mat.handle.setLGMap(*mat.handle.getLGMap()) mat.handle.assemble()
Hope this helps.
@wence- What is the reason behind changing the l2gmaps here? is it for a specific matrix type? or what? when you change the l2gmaps, MATIS
destroys the old local matrices and recreates new ones, thus throwing away any preallocation done at init
time.
@wence- What is the reason behind changing the l2gmaps here? is it for a specific matrix type? or what? when you change the l2gmaps,
MATIS
destroys the old local matrices and recreates new ones, thus throwing away any preallocation done atinit
time.
We swap out the lgmaps for the case where we apply boundary conditions and want to mask certain nodes from being written to. I think there is a bug in (see #2487).assemble
where we always do this swap even if there are no BCs (so the old and new lgmaps are the same)
@wence- What is the reason behind changing the l2gmaps here? is it for a specific matrix type? or what? when you change the l2gmaps,
MATIS
destroys the old local matrices and recreates new ones, thus throwing away any preallocation done atinit
time.We swap out the lgmaps for the case where we apply boundary conditions and want to mask certain nodes from being written to. I think there is a bug in
assemble
where we always do this swap even if there are no BCs (so the old and new lgmaps are the same).
masking as skipping values insertion? You could use negative entries for that. If bc locations are not dynamic (i.e. indices do not change during subsequent assemblies) then you should be fine.
If I'm reading the code correctly then that is exactly what we do. Possibly the issue is that if we apply boundary conditions then we actually create a new lgmap and swap it in rather than modifying the one already in use.
Yes, that's right. TBH, I think a neater way to make the matis structure would be to intercept the matrix construction and make seqaij neumann matrices on each process, then assemble the global dirichlet matrix as normal before and make the matis you need by hand.
as @connorjward says we modify the lgmaps to apply boundary conditions, but this doesn't play well with matis (which takes change of lgmap as a sign that everything is invalidated). When I looked at this about 3 years ago I didn't manage to figure out a good solution
@wence- What is the reason behind changing the l2gmaps here? is it for a specific matrix type? or what? when you change the l2gmaps,
MATIS
destroys the old local matrices and recreates new ones, thus throwing away any preallocation done atinit
time.We swap out the lgmaps for the case where we apply boundary conditions and want to mask certain nodes from being written to. I think there is a bug in
assemble
where we always do this swap even if there are no BCs (so the old and new lgmaps are the same).masking as skipping values insertion? You could use negative entries for that. If bc locations are not dynamic (i.e. indices do not change during subsequent assemblies) then you should be fine.
yes, negative values. but we don't condense out dirichlet rows, so afterwards we must insert identity on the diagonal, which is what i recall used to cause problems with matis
Although it was a brutally different context, one way I have seen this problem solved is to put a 1/#numOwningProcs in each Dirichlet dof. This will result in an identity upon assembly, and allows for considering dirichlet bcs in MATIS. We would only require this type of info for the dofs, which perhaps you already do.
Hi firedrakers,
I have been working with my colleagues on an extension of Firedrake for the BDDC preconditioner, for which MATIS matrices are required. For this, the allocation procedure in PyOP2 has been changed into the following:
in both init_monolithic() and init_block() functions. Although the code seems to yield a correct type and allocation, some things look strange. For instance, I have modified the Helmholtz demo as a test into the following:
which gives the following error (PETSc compiled with debug):
For a clearer picture, I added a line in the PyOP2 mat.py file to get the matrix type, and it actually prints it twice, so that in fact _assemble_jac() is being called twice, and the error is given the second time. I have then modified the script to use only assembled objects in order to avoid the many hooks and calls from the nonlinear solver, so that the solution procedure now looks like this:
I use the 'nn' preconditioner because things seemed like they worked, but the matrix type wasn't being used. After this, the error is the following:
so that in fact the mat_type is not being read. At this point I see two issues:
i ) The interfaces for solving a linear problem are not consistent, and ii) The assemble_jac routine is being called twice, and in the meantime it is doing something to the matrix that breaks its allocation (... apparently)
I'm not sure how to follow, as the modification we have included looks too innocent, but it is not being propagated correctly. Any ideas are welcome, thanks in advance!
Best, NB