GEOS-DEV / GEOS

GEOS Simulation Framework
GNU Lesser General Public License v2.1
207 stars 83 forks source link

Direct solver produces incorrect result #225

Closed klevzoff closed 5 years ago

klevzoff commented 5 years ago

Describe the bug When called with a 15x15 matrix (a 3-cell problem with 5 DoF per cell), the Trilinos direct solver outputs an incorrect result with some values blowing up. The same matrix/rhs when printed out and imported in MATLAB give a different, more or less meaningful, result.

To Reproduce Steps to reproduce the behavior:

  1. Check out feature/klevzoff/compositional-flow-solver commit 29e5283efec149a7b81cea01a83728795c99f35e
  2. Build GEOSX (debug or release)
  3. Run with src/coreComponents/unitTests/CompositionalMultiphaseFlow/1D_3cell.xml as input
  4. The solver will stop due to lack of Newton convergence
  5. Check the very first matrix/rhs/solution that are printed out (or in fact any of them)
  6. Import into MATLAB/Octave/etc and verify the solution is wrong

Expected behavior A direct solver should more or less match matlab's backslash. The matrix is non-singular.

Platform (please complete the following information):

Additional context Matrix/vectors in comments below As a note, iterative solver just fails during ILU with a zero pivot error. Matlab has no problem with ILU for this matrix.

klevzoff commented 5 years ago

Matrix:

   (1,1)     -2.1957e-08
   (6,1)     -2.2586e-11
   (7,1)     -1.0892e-11
   (8,1)     -2.1782e-11
   (9,1)     -3.6321e-14
   (2,2)     -3.3451e-04
   (3,3)     -4.0670e-03
   (4,4)     -1.3463e-03
   (5,5)     -2.4060e-05
   (6,6)     -2.1934e-08
   (7,6)     -4.1230e-07
   (8,6)     -8.2461e-07
   (9,6)     -1.3742e-09
  (10,6)      9.6148e-09
   (6,7)     -6.3128e-05
   (7,7)     -3.3451e-04
   (8,7)     -4.4912e-04
   (9,7)     -2.2028e-04
  (10,7)     -2.4763e-05
   (6,8)     -4.4687e-04
   (7,8)     -2.3667e-03
   (8,8)     -4.0670e-03
   (9,8)     -6.7204e-04
  (10,8)     -2.6206e-05
   (6,9)     -1.0196e-03
   (7,9)     -5.3997e-03
   (8,9)     -9.4667e-03
   (9,9)     -1.3463e-03
  (10,9)     -2.8359e-05
   (6,10)     1.2397e-04
   (7,10)     6.5628e-04
   (8,10)     1.3148e-03
   (9,10)    -2.6294e-08
  (10,10)    -2.4060e-05
   (6,11)    -2.2586e-11
   (7,11)    -1.0892e-11
   (8,11)    -2.1782e-11
   (9,11)    -3.6321e-14
  (11,11)    -2.1957e-08
  (12,12)    -3.3451e-04
  (13,13)    -4.0670e-03
  (14,14)    -1.3463e-03
  (15,15)    -2.4060e-05

The only difference: the matrix in GEOSX has a bunch of additional zeros explicitly stored (that result from upwinding and/or BC application)

Looks like this: (Cells 1 (rows 1-5) and 3 (rows 11-15) have Dirichlet BC applied to them, hence diagonal) matrix

klevzoff commented 5 years ago

rhs:

  -1.0978e-01
  -3.6317e-01
  -2.7076e+00
  -5.2500e-02
  -2.1763e-01
            0
            0
            0
            0
   1.2204e-18
   1.0759e-01
   6.7701e-02
   2.4943e+00
   1.6514e+00
   4.9186e-05
klevzoff commented 5 years ago

solution vectors:

Solutions for cells 1 and 3 match, for cell 2 completely different, with geosx (Trilinos) solution being nonsensical

klevzoff commented 5 years ago

Basically, I'm just looking for advice on what to check, if anyone has experience dealing with this sort of issues

MatthiasCremon commented 5 years ago

Are you using KLU from Amesos? I don't remember it blowing up on the super easy (Laplace operator) test cases we ran but I could take a closer look.

klevzoff commented 5 years ago

I spent a few hours debugging ILU factorization in Aztec. It does indeed encounter a zero pivot with the row/col ordering it uses. Pivoting is not implemented in incomplete factorizations, as per the user guide. Setting a nonzero drop tolerance helps with the zero pivot issue (it gets replaced by row norm scaled by drop tolerance) but still leads to invalid results. Matlab notably uses pivoting and permutes the rows differently and thus does not see the issue.

I could not yet figure a combination of parameters to make Aztec use a different row/col ordering. I suspect KLU suffers from the same issue (and so does Amesos_Lapack, which I tried), though both KLU and the code behind Matlab's backslash are supposedly written by the same guy (Tim Davis), so there might be a set of options to KLU to make it work.

Then again, I don't even know if my matrix is 100% correct (w.r.t. the problem being solved). But if at least one solver can handle it without issues, there should be a way to proceed.

joshua-white commented 5 years ago

I am going to close this issue, as AmesosKLU is out of our direct control. We have a workaround, which is just to use SuperLU as the direct solver.