easybuilders / easybuild-easyconfigs

A collection of easyconfig files that describe which software to build using which build options with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
374 stars 701 forks source link

GROMACS 2019 test failure on Epyc #7826

Open verdurin opened 5 years ago

verdurin commented 5 years ago

One of the 57 tests failed when building for foss-2018b on an Epyc node:

[ RUN      ] NormalMdrunIsReproduced/MdrunRerunTest.WithinTolerances/6

NOTE 1 [file /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_input.mdp, line 28]:
  /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_input.mdp did not specify a value for the .mdp opti
on "cutoff-scheme". Probably it
  was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
  scheme was introduced, but the group scheme was still the default. The
  default is now the Verlet scheme, so you will observe different behaviour.

NOTE 2 [file /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_input.mdp]:
  With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
  that with the Verlet scheme, nstlist has no effect on the accuracy of
  your simulation.

NOTE 3 [file /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_input.mdp]:
  Setting nstcalcenergy (100) equal to nstenergy (4)

Generated 330891 of the 330891 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 330891 of the 330891 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Removing all charge groups because cutoff-scheme=Verlet
Number of degrees of freedom in T-Coupling group System is 27.00

NOTE 4 [file /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_input.mdp]:
  You are using a plain Coulomb cut-off, which might produce artifacts.
  You might want to consider using PME electrostatics.

There were 4 notes
Reading file /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6.tpr, VERSION 2019 (single precision)
Changing nstlist from 8 to 100, rlist from 0.733 to 0.824

Using 1 MPI thread
Using 2 OpenMP threads

NOTE: The number of threads is not equal to the number of (logical) cores
      and the -pin option is set to auto: will not pin threads to cores.
      This can lead to significant performance degradation.
      Consider using -pin on (and -pinoffset in case you run multiple jobs).
starting mdrun 'spc2'
16 steps,      0.0 ps.
Determining Verlet buffer for a tolerance of 1e-06 kJ/mol/ps at 298 K
Calculated rlist for 1x1 atom pair-list as 0.735 nm, buffer size 0.035 nm
Set rlist, assuming 4x4 atom pair-list, to 0.733 nm, buffer size 0.033 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup
This run will generate roughly 0 Mb of data

Writing final coordinates.

               Core t (s)   Wall t (s)        (%)
       Time:        0.003        0.002      194.1
                 (ns/day)    (hour/ns)
Performance:      915.937        0.026
Reading file /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6.tpr, VERSION 2019 (single precision)
Changing nstlist from 8 to 100, rlist from 0.733 to 0.824

Using 1 MPI thread
Using 2 OpenMP threads

NOTE: The number of threads is not equal to the number of (logical) cores
      and the -pin option is set to auto: will not pin threads to cores.
      This can lead to significant performance degradation.
      Consider using -pin on (and -pinoffset in case you run multiple jobs).
starting md rerun 'spc2', reading coordinates from input trajectory '/dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_Wit
hinTolerances_6_normal.trr'

^MReading frame       0 time    0.000   ^MReading frame       1 time    0.004   ^MReading frame       2 time    0.008   ^MReading frame       3 time    0.012   ^MReading frame       4 time    0.016
^MLast frame          4 time    0.016

NOTE: 33 % of the run time was spent in pair search,
      you might want to increase nstlist (this has no effect on accuracy)

               Core t (s)   Wall t (s)        (%)
       Time:        0.001        0.001      188.6
                 (ns/day)    (hour/ns)
Performance:     2244.296        0.011
Opened /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_rerun.edr as single precision energy file
Opened /dev/shm/GROMACS/2019/foss-2018b/easybuild_obj/src/programs/mdrun/tests/Testing/Temporary/NormalMdrunIsReproduced_MdrunRerunTest_WithinTolerances_6_normal.edr as single precision energy file
^MReading energy frame      0 time    0.000         ^MReading energy frame      0 time    0.000         ^MReading energy frame      1 time    0.004         ^MReading energy frame      1 time    0.004
       ^MReading energy frame      2 time    0.008         ^MReading energy frame      2 time    0.008         ^MReading energy frame      3 time    0.012         ^MReading energy frame      3 time
0.012         ^MReading energy frame      4 time    0.016         ^MReading energy frame      4 time    0.016         /dev/shm/GROMACS/2019/foss-2018b/gromacs-2019/src/programs/mdrun/tests/energycompar
ison.cpp:79: Failure
  Value of: energyValueInTest
    Actual: -17.620214462280273
  Expected: energyValueInReference
  Which is: -17.620309829711914
Difference: 9.53674e-05 (50 single-prec. ULPs, rel. 5.41e-06)
 Tolerance: abs. 5.72205e-05, 48 ULPs
Google Test trace:
/dev/shm/GROMACS/2019/foss-2018b/gromacs-2019/src/programs/mdrun/tests/energycomparison.cpp:73: Comparing Potential between frames
/dev/shm/GROMACS/2019/foss-2018b/gromacs-2019/src/programs/mdrun/tests/mdruncomparison.h:111: Comparing frames from two runs 'Time 0.016000 Step 16' and 'Time 0.016000 Step 16'
/dev/shm/GROMACS/2019/foss-2018b/gromacs-2019/src/programs/mdrun/tests/rerun.cpp:238: Comparing normal and rerun of simulation 'spc5' with integrator 'bd'
^MLast energy frame read 4 time    0.016         ^MLast energy frame read 4 time    0.016

[  FAILED  ] NormalMdrunIsReproduced/MdrunRerunTest.WithinTolerances/6, where GetParam() = ("spc5", "bd") (162 ms)
[       OK ] MdrunIsReproduced/MdrunRerunFreeEnergyTest.WithinTolerances/32 (126 ms)
[----------] 33 tests from MdrunIsReproduced/MdrunRerunFreeEnergyTest (3951 ms total)

[----------] Global test environment tear-down
[==========] 57 tests from 4 test cases ran. (6491 ms total)
[  PASSED  ] 56 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] NormalMdrunIsReproduced/MdrunRerunTest.WithinTolerances/6, where GetParam() = ("spc5", "bd")

 1 FAILED TEST

      Start 38: LegacyGroupSchemeMdrunTests
38/39 Test #38: LegacyGroupSchemeMdrunTests ......   Passed    0.28 sec
      Start 39: MdrunMpiTests
39/39 Test #39: MdrunMpiTests ....................   Passed    1.83 sec

97% tests passed, 1 tests failed out of 39

Label Time Summary:
GTest              =  23.60 sec*proc (39 tests)
IntegrationTest    =  10.85 sec*proc (5 tests)
MpiTest            =   0.04 sec*proc (3 tests)
SlowTest           =  10.84 sec*proc (1 test)
UnitTest           =   1.92 sec*proc (33 tests)

Total Test time (real) =  23.64 sec

The following tests FAILED:
         37 - MdrunNonIntegratorTests (Failed)
verdurin commented 5 years ago

GROMACS 2018.3 builds and tests fine on the same node.