boutproject / BOUT-dev

BOUT++: Plasma fluid finite-difference simulation code in curvilinear coordinate systems
http://boutproject.github.io/
GNU Lesser General Public License v3.0
184 stars 95 forks source link

PETSc 3.13.1 differences #2034

Closed dschwoerer closed 4 years ago

dschwoerer commented 4 years ago

With PETSc 13.3.1 I am observing some differences in the unit tests, that exceed the tolerance:

[ RUN      ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3D/1
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.6394707520229774e-08, which exceeds tol, where
expected[i] evaluates to 9.5038182179502777e-06,
actual[i] evaluates to 9.4874235104300479e-06, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.2710584896864915e-08, which exceeds tol, where
expected[i] evaluates to 3.7669102759471052e-05,
actual[i] evaluates to 3.7681813344367917e-05, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.3173412445578129e-08, which exceeds tol, where
expected[i] evaluates to 0.00017995085068106214,
actual[i] evaluates to 0.00017996402409350772, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.4495775929692408e-08, which exceeds tol, where
expected[i] evaluates to 3.7669102759471052e-05,
actual[i] evaluates to 3.765460698354136e-05, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.452321204964413e-08, which exceeds tol, where
expected[i] evaluates to 0.15632795975851177,
actual[i] evaluates to 0.15632797428172382, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.0769530632346702e-08, which exceeds tol, where
expected[i] evaluates to 1.2951180023848552e-05,
actual[i] evaluates to 1.2961949554480899e-05, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.2938825777724511e-08, which exceeds tol, where
expected[i] evaluates to 0.00066493591255396053,
actual[i] evaluates to 0.0006649229737281828, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.2182394648618811e-08, which exceeds tol, where
expected[i] evaluates to 5.6457022233948591,
actual[i] evaluates to 5.6457022112124644, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.1941694501305111e-08, which exceeds tol, where
expected[i] evaluates to 0.040474230687202815,
actual[i] evaluates to 0.040474218745508314, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.4039712364910528e-08, which exceeds tol, where
expected[i] evaluates to 0.15632795975851177,
actual[i] evaluates to 0.1563279457187994, and
tol evaluates to 1e-08.
[  FAILED  ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3D/1, where GetParam() = (false, false, false, true) (10 ms)
[ RUN      ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3D/3
invert/laplace/test_laplace_petsc3damg.cxx:358: Failure
The difference between expected[i] and actual[i] is 1.4713936943740169e-08, which exceeds tol, where
expected[i] evaluates to 3.7669102759471052e-05,
actual[i] evaluates to 3.7654388822527312e-05, and
tol evaluates to 1e-08.
[  FAILED  ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3D/3, where GetParam() = (false, true, false, false) (5 ms)
[ RUN      ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3DGuess/1
invert/laplace/test_laplace_petsc3damg.cxx:366: Failure
The difference between expected[i] and actual[i] is 1.462702414359286e-08, which exceeds tol, where
expected[i] evaluates to 2.5061215558300253e-06,
actual[i] evaluates to 2.4914945316864325e-06, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:366: Failure
The difference between expected[i] and actual[i] is 1.7976235022264215e-08, which exceeds tol, where
expected[i] evaluates to 1.2951180023848552e-05,
actual[i] evaluates to 1.2969156258870817e-05, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:366: Failure
The difference between expected[i] and actual[i] is 1.8891486989872953e-08, which exceeds tol, where
expected[i] evaluates to 0.00066493591255396053,
actual[i] evaluates to 0.0006649548040409504, and
tol evaluates to 1e-08.
invert/laplace/test_laplace_petsc3damg.cxx:366: Failure
The difference between expected[i] and actual[i] is 1.7346453286493308e-08, which exceeds tol, where
expected[i] evaluates to 0.00012235251729085351,
actual[i] evaluates to 0.00012233517083756702, and
tol evaluates to 1e-08.
[  FAILED  ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3DGuess/1, where GetParam() = (false, false, false, true) (5 ms)
[ RUN      ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3DGuess/3
invert/laplace/test_laplace_petsc3damg.cxx:366: Failure
The difference between expected[i] and actual[i] is 1.1687501189072691e-08, which exceeds tol, where
expected[i] evaluates to 0.72266014117750454,
actual[i] evaluates to 0.72266015286500573, and
tol evaluates to 1e-08.
[  FAILED  ] LaplacePetsc3dAmgTest/Petsc3dAmgTest.TestSolve3DGuess/3, where GetParam() = (false, true, false, false) (5 ms)

The difference is present in 3.13.1, 3.12.5 and 3.11.4. Only 3.10.5 did not fail.

dschwoerer commented 4 years ago

Strangely, testing the packaged PETSc (3.13.1) in fedora works ...

ZedThree commented 4 years ago

Very odd! How did you build PETSc? I'm sure I was testing with the PETSc master branch not so long, but it's possible this test wasn't in at that point.

This is tickling some faint memory of a PETSc option that might not have played very well, but it's not coming to me just yet.

Could you maybe have a play with the PETSc AMG solver options, e.g. ksptype?

ZedThree commented 4 years ago

Looks like I was checking a slightly different test. I'm bisecting PETSc with these tests to see if I can pinpoint the change in PETSc that breaks this.

dschwoerer commented 4 years ago

Haven't gotten around to check up on this any further, this is the command I used to install PETSc:

unset PETSC_DIR
unset PETSC_ARCH

function fail() {
    echo "Failure building $ver"
    for f in best.*.log
    do
        echo $f
        echo
        cat $f
        echo
    done
    echo $@
    exit 2
}

for V in 13 #{9..13}
do
    ver=3.$V
    wget -N http://ftp.mcs.anl.gov/pub/petsc/petsc-$ver.tar.gz 2>&1 | grep 'Not Modified'
    ex=$?
    if test $ex -gt 0 
    then
        rm -rf petsc-$ver.[0-9]*
        tar -xf petsc-$ver.tar.gz
        full=$(ls petsc-$ver.? -d)
        test $full || fail "Cannot find version"
        (cd $full || fail Failed to cd
         module purge
         module load mpi/mpich-x86_64 || fail "Failed to load mpi"
         if test "$ver" == "3.4"
         then
             opts=--download-f-blas-lapack=1
         else
             opts=--download-fblaslapack=1
         fi
         #--with-clanguage=cxx
         opts="$opts  
  --with-mpi=yes
  --with-precision=double
  --with-scalar-type=real
  --with-pic=1"
         #  --with-shared-libraries=0"
         export MAKEFLAGS=-j2
         ./configure $opts > best.config.log || fail "configure failed"
         make PETSC_DIR=$HOME/soft/petsc-rel/$full PETSC_ARCH=arch-linux2-c-debug all > best.make.log || fail make failed
         make PETSC_DIR=$HOME/soft/petsc-rel/$full PETSC_ARCH=arch-linux2-c-debug test > best.test.log|| fail make failed
        ) || exit
    fi
done
ZedThree commented 4 years ago

Looks like it was this change in 3.11:

Previously the KSP Chebyshev implementation always did one more iteration than requested. For example -ksp_max_it 2 resulted in 3 Chebyshev iterations. This has been corrected. Due to this correction your solver may seem to converge more slowely than it previous has. Note that the multigrid solvers (PCMG, PCGAMG, PCML) used by default Chebyshev (with 3 actual steps) in their smoother, thus the multigrid solvers will now have seemingly different convergence rates since they will now use only 2 actual steps. To reproduce previous behavior change the number of smoother iterations to match the previous actual amount, this can be done with for example -mg_levels_ksp_max_it 3 (or -prefix_mg_levels_ksp_max_it 3 if the KSP object has a prefix).

I initially dismissed this because we're not using Chebyshev for the solver, but this is actually for the smoother. I'm currently trying to work out how to set that option it suggests, but they don't make it easy.

ZedThree commented 4 years ago

Fixed in #2051