boutproject / BOUT-dev

BOUT++: Plasma fluid finite-difference simulation code in curvilinear coordinate systems
http://boutproject.github.io/
GNU Lesser General Public License v3.0
183 stars 95 forks source link

Bug in LaplacePCR #2350

Open ZedThree opened 3 years ago

ZedThree commented 3 years ago

In #2336 I've added an alternative to #2333: https://github.com/boutproject/BOUT-dev/blob/93454ecaded94667725d48c3cb9b991c23f239bc/src/invert/laplace/invert_laplace.cxx#L106-L126

Basically check that we can use PCR as the default Laplacian, otherwise fall back to cyclic. This essentially makes PCR the new default in most cases.

Unfortunately (or possibly fortunately?) this throws up a few issues with PCR in the following tests:

Test 1: PETSc 2nd order Magnitude of maximum absolute error is -1 KSPConvergedReason is -5 BoutException occured in invert->solve(b1): petsc_laplace: inversion failed to converge.

Test 2: PETSc 4th order Magnitude of maximum absolute error is -1 KSPConvergedReason is -5 BoutException occured in invert->solve(b3): petsc_laplace: inversion failed to converge.

Test 3: with coefficients constant in z, PETSc 2nd order Magnitude of maximum absolute error is -1

Test 4: with coefficients constant in z, default solver Magnitude of maximum absolute error is 0.999817 KSPConvergedReason is -5 BoutException occured in invert->solve(b5): petsc_laplace: inversion failed to converge.

Test 5: different profiles, PETSc 2nd order Magnitude of maximum absolute error is -1 KSPConvergedReason is -5 BoutException occured in invert->solve(b6): Laplacian inversion failed to converge (probably)

Test 6: different profiles, PETSc 4th order Magnitude of maximum absolute error is -1 KSPConvergedReason is -5 BoutException occured in invert->solve(b7): petsc_laplace: inversion failed to converge.

Test 7: different profiles, with coefficients constant in z, PETSc 2nd order Magnitude of maximum absolute error is -1

Test 8: different profiles, with coefficients constant in z, default solver Magnitude of maximum absolute error is 1.00577



Given that `test-laplace` is checked with both `cyclic` and `pcr` there must be something else going on. The common factor seems to be the outer boundary flags:

- test-laplace checks only `INVERT_SET` in the outer boundary
- test-gyro uses `INVERT_BNDRY_ONE + INVERT_RHS`
- test-petsc_laplace uses `INVERT_AC_GRAD` and `INVERT_AC_GRAD | INVERT_DC_GRAD`
- test-petsc_laplace_MAST-grid uses `INVERT_AC_GRAD`

My suspicions are that `LaplacePCR ::eliminate_boundary_rows` only accounts for one boundary, when there could be up to `MXG` boundary rows that need eliminating, depending on the inversion flags.
JosephThomasParker commented 3 years ago

Thanks Peter! The purpose of eliminate boundary rows is just to break the linear dependence between the boundary rows and the interior rows, so it should only require the one boundary row. Though it does assume that the structure is

                            boundary. | interior
boundary ->               0  x  x  0  |. 0. 0. 0  
boundary ->               0. 0. x. x. |  0  0. 0
boundary ->.              0. 0. 0  x. |  x. 0. 0
interior ->               0. 0. 0. x. |. x. x. 0
interior ->.              0. 0. 0. 0. |  x. x. x

becomes

                            boundary. | interior
boundary ->               0  x  x  0  |. 0. 0. 0  
boundary ->               0. 0. x. x. |  0  0. 0
boundary ->.              0. 0. 0  x. |  x. 0. 0
interior ->               0. 0. 0. 0. |. Y. x. 0
interior ->.              0. 0. 0. 0. |  x. x. x

Are there cases where this won't be the structure?

There might also be places where this assumes there are two guard cells, so I'll check that too.

ZedThree commented 3 years ago

Ah I see! I was thinking that might be suspicious just because I was tracing through where the a/b/c from Laplacian::tridagMatrix are used. That deals with INVERT_AC_GRAD for both PCR and cyclic, so presumably the issue is how they are used.

So if it's not eliminate_boundary_rows, it must be somewhere else in cr_pcr_solver?

Is the issue that aa, bb, cc only contain one boundary/guard cell? For cyclic, a, b, c don't use any guard cells, but do include all the boundaries.

dschwoerer commented 3 years ago

Might it be related to the failure in #2207 ?

That is also failing if rebased to next ...

JosephThomasParker commented 3 years ago

Just to quickly note a few things: There is a problem in eliminate_boundary_rows: in the test-gyro case the wrong rows are eliminated (the second and penultimate internal rows, instead of the first and final internal rows). This is confusing as the indexing is determined by localmesh->xstart. Does the meaning of xstart change depending on the boundary conditions?

But in any case, fixing the indexing still gets the wrong results, with the error in \|Ax - b\| only really being significant in the few rows near the boundaries.

Still working on this...

JosephThomasParker commented 3 years ago

Ok, found the problem: the code requires the number of internal rows to be a power of 2, and checks this by doing:

  // Number of x points must be a power of 2
  if (!is_pow2(localmesh->GlobalNxNoBoundaries)) {
    throw BoutException("LaplacePCR error: GlobalNx must be a power of 2");
  }

where GlobalNxNoBoundaries = nx - 2*MXG; is a power of two in both test-laplace and test-gyro. However, in test-gyro, the flag INVERT_BNDRY_ONE is set, which means only one boundary cell is used, even though MXG=2. So in test-gyro we're actually doing the inversion on 68 - 2*1 = 66 points, not 68 - 2*2 = 64 points, so that's why the solver fails.

I'll think about whether it's possible to tweak the row elimination to get this to work. The "real" problem though is the inconsistency over how many guard cells we are using.

ZedThree commented 3 years ago

Is this a "fundamental" problem that PCF is incompatible with INVERT_BNDRY_ONE? Or it would work, if the number of interior points were different such that we'd still use a power of two?

If the latter, then we could adjust the precondition to:

const int global_nx = localmesh->GlobalNxNoBoundaries;
const int global_nx_maybe_with_boundary = using_invert_bndry_one ? global_nx + 1 : global_nx;

if (not is_pow2(global_nx_maybe_with_boundary)) {
  ...

(terrible names not withstanding). Then in #2336 we'd neatly fall back to cyclic if that weren't met.

If we can handle this in the row elimination and don't need to adjust the precondition, that's probably the better solution though?

johnomotani commented 3 years ago

I'd be tempted to suggest just deprecating/removing INVERT_BNDRY_ONE. I can't see when it would ever be useful, and I'm not sure why it's there at all - maybe @bendudson knows?

bendudson commented 3 years ago

Don't know why INVERT_BNDRY_ONE exists really. I suspect it was an attempt to deal with cases where the boundary to be applied isn't clear, such as with gyro-averaging: What boundary condition should be put on gyro-averaged density, to be consistent with density? By moving the boundary away from the domain, it may give that bit of additional freedom at the actual boundary. I'm not sure if it actually makes any difference though.

ZedThree commented 3 years ago

The plan is now to make solve throw if INVERT_AC_GRAD is set. This means we won't use PCR as the default solver.