Closed owainkenwayucl closed 6 years ago
Flags look the same for plumed 2.4.0 as 2.3.1, starting test build.
VERSION=2.4.0 SHA1=25242eb66f4a8fbb4bff66745fbf927b7f4cd32e ./plumed-2.3.1_install
+ ERROR in test analysis/rt-pca/
+ check file analysis/rt-pca/report.txt for more information
+ ERROR in test basic/rt63c/
+ check file basic/rt63c/report.txt for more information
+ ERROR in test basic/rt63c-mpi/
+ check file basic/rt63c-mpi/report.txt for more information
+ ERROR in test basic/rt63d/
+ check file basic/rt63d/report.txt for more information
+ ERROR in test basic/rt64-pca/
+ check file basic/rt64-pca/report.txt for more information
+ ERROR in test basic/rt65-rmsd2/
+ check file basic/rt65-rmsd2/report.txt for more information
+ ERROR in test basic/rt-close-structure/
+ check file basic/rt-close-structure/report.txt for more information
+ ERROR in test basic/rt-multi-1/
+ check file basic/rt-multi-1/report.txt for more information
+ ERROR in test crystallization/rt-sean-marks/
+ check file crystallization/rt-sean-marks/report.txt for more information
+ ERROR in test isdb/rt-emmi/
+ check file isdb/rt-emmi/report.txt for more information
+ ERROR in test isdb/rt-jcouplings/
+ check file isdb/rt-jcouplings/report.txt for more information
+ ERROR in test isdb/rt-jcouplings-mi/
+ check file isdb/rt-jcouplings-mi/report.txt for more information
+ ERROR in test mapping/rt-pathtools-2/
+ check file mapping/rt-pathtools-2/report.txt for more information
+ ERROR in test mapping/rt-pathtools-3/
+ check file mapping/rt-pathtools-3/report.txt for more information
+ ERROR in test mapping/rt-pca/
+ check file mapping/rt-pca/report.txt for more information
+ ERROR in test mapping/rt-pca-multi/
+ check file mapping/rt-pca-multi/report.txt for more information
+ ERROR in test mapping/rt-tpath/
+ check file mapping/rt-tpath/report.txt for more information
+++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Final report:
+ 279 tests performed, 122 tests not applicable
+ 17 errors found
Most of the errors were smallish numeric ones, but this one had NaNs.
Thu 18 Jan 12:14:20 GMT 2018
Running regtest in /dev/shm/tmp.e44QyjOPnG/plumed2-2.4.0/regtest/mapping/rt-tpath
++ Test type: driver
++ Arguments: --plumed plumed.dat --trajectory-stride 50 --timestep 0.005 --ixyz diala_traj_nm.xyz --dump-fo
rces forces --dump-forces-fmt=%10.6f
++ Processors: 0
/dev/shm/tmp.e44QyjOPnG/plumed2-2.4.0/regtest/mapping/rt-tpath/tmp
FAILURE
Diff for colvar:
2,547c2,547
< 0.000000 21.4988 0.0807 1.9191 0.0115
< 0.250000 20.9438 0.0226 0.2895 0.0084
< 0.500000 20.6238 0.0050 0.7852 0.0072
< 0.750000 20.5725 0.0067 0.6334 0.0051
< 1.000000 21.6318 0.0343 1.3683 0.0083
...
< 135.750000 39.9775 0.0226 41.7694 0.0055
< 136.000000 40.1872 0.0060 42.4231 0.0080
< 136.250000 39.2643 0.0889 42.6489 0.0073
---
> 0.000000 21.4988 0.0807 1.5201 -nan
> 0.250000 20.9438 0.0226 1.4708 -nan
> 0.500000 20.6238 0.0050 -2.5346 -nan
> 0.750000 20.5725 0.0067 -0.4449 -nan
> 1.000000 21.6318 0.0343 1.4984 -nan
...
> 135.750000 39.9775 0.0226 41.5095 -nan
> 136.000000 40.1872 0.0060 50.2911 -nan
> 136.250000 39.2643 0.0889 52.6817 -nan
Will start by adding -fp-model strict
.
That only fixed 5 errors.
+ ERROR in test analysis/rt-pca/
+ check file analysis/rt-pca/report.txt for more information
+ ERROR in test basic/rt63c/
+ check file basic/rt63c/report.txt for more information
+ ERROR in test basic/rt63c-mpi/
+ check file basic/rt63c-mpi/report.txt for more information
+ ERROR in test basic/rt63d/
+ check file basic/rt63d/report.txt for more information
+ ERROR in test basic/rt64-pca/
+ check file basic/rt64-pca/report.txt for more information
+ ERROR in test basic/rt65-rmsd2/
+ check file basic/rt65-rmsd2/report.txt for more information
+ ERROR in test basic/rt-close-structure/
+ check file basic/rt-close-structure/report.txt for more information
+ ERROR in test isdb/rt-emmi/
+ check file isdb/rt-emmi/report.txt for more information
+ ERROR in test mapping/rt-pathtools-2/
+ check file mapping/rt-pathtools-2/report.txt for more information
+ ERROR in test mapping/rt-pathtools-3/
+ check file mapping/rt-pathtools-3/report.txt for more information
+ ERROR in test mapping/rt-pca/
+ check file mapping/rt-pca/report.txt for more information
+ ERROR in test mapping/rt-pca-multi/
+ check file mapping/rt-pca-multi/report.txt for more information
+ ERROR in test mapping/rt-tpath/
+ check file mapping/rt-tpath/report.txt for more information
+++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Final report:
+ 279 tests performed, 122 tests not applicable
+ 13 errors found
mapping/rt-tpath/report.txt
still has NaNs.
2.4.0 uses c++11. I'm wondering if our gcc is too old.
It thinks gcc 4.8.1 and intel 15 and up should be sufficient.
Out of curiosity, I'm trying it on Grace to see if the results are the same (AVX).
Also trying one on Legion with compilers/intel/2017/update1
and mpi/intel/2017/update1/intel
as that's what plumed 2.3.1 was built with.
Same 13 failed on Grace.
A little way in to the tests with the first Intel 2017 on Legion, still got NaNs. (And same 13 tests failed).
There are newer versions now - try those.
Ah, this is the problem I was getting in the tests: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/plumed-users/pXU_PGjkF1I
This is just to notify that there is a bug in the calculation of RMSD and all RMSD derived quantities in PLUMED v2.4.0, appearing when using recent intel compiler (and perhaps other recent compilers). The bug basically appears when using SIMD instructions, which are enabled by compilers implementing OpenMP 4.0
https://github.com/plumed/plumed2/pull/343
I discovered the problem running the regression tests on a machine where I compiled with a recent intel compiler. Unfortunately (or fortunately?) at SISSA we have very old compilers, so I had no way to detect the problem in the past. Now I have access to a machine in CINECA where I compiled PLUMED 2.4. There is a collection of tests that reproducibly give incorrect results with intel 17. Notice that when using intel 17 several tests give slightly incorrect results (just because we store too many digits in the reference files), this is not a problem. But some regtests were reporting nan or other strange values. I noticed that all of them were involving some form of alignment, tracked down the problem to RMSD calculation, had a look at the simd instructions that we added recently, and discovered the bug.
Notice that not all the tests using RMSD are failing (actually, only a minority of them). In particular, those that crashed were using one of the following keywords: PCA FIT_TO_TEMPLATE PATH (no problem detected with PATHMSD)
E.g., test basic/rt63c computes the RMSD correctly, then apply FIT_TO_TEMPLATE and computes again the RMSD that is now incorrect. None of the test using PATHMSD is reporting errors.
Fixed in 2.4.1, so I'll try that.
+ ERROR in test basic/rt-multi-1/
+ check file basic/rt-multi-1/report.txt for more information
+ ERROR in test crystallization/rt-sean-marks/
+ check file crystallization/rt-sean-marks/report.txt for more information
+ ERROR in test isdb/rt-emmi/
+ check file isdb/rt-emmi/report.txt for more information
+ ERROR in test isdb/rt-jcouplings/
+ check file isdb/rt-jcouplings/report.txt for more information
+ ERROR in test isdb/rt-jcouplings-mi/
+ check file isdb/rt-jcouplings-mi/report.txt for more information
+++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Final report:
+ 279 tests performed, 128 tests not applicable
+ 5 errors found
Checked isdb/rt-emmi/report.txt
- those are last decimal place differences all the way through until the last bit where they compound. Will see if fp-model-strict
clears them all up.
Checked basic/rt-multi-1/report.txt
- these look like the storing too many digits mentioned above...
FAILURE
Diff for ff.0:
8,16c8,16
< 6 726.794919 0.000000
< 7 726.794919 0.000000
< 8 726.794919 0.000000
< 9 726.794919 0.000000
< 10 726.794919 0.000000
< 11 726.794919 0.000000
< 12 726.794919 0.000000
< 13 726.794919 0.000000
< 14 726.794919 0.000000
---
> 6 726.794919 0.000000
> 7 726.794919 0.000000
> 8 726.794919 0.000000
> 9 726.794919 0.000000
> 10 726.794919 0.000000
> 11 726.794919 0.000000
> 12 726.794919 0.000000
> 13 726.794919 0.000000
> 14 726.794919 0.000000
The other tests are either single decimal differences or appear the same as the above.
isdb/rt-jcouplings-mi/report.txt
has a sign flip.
Diff for force.new:
473c473
< 0.012000 66 0.0018309683
---
> 0.012000 66 0.0018309682
476c476
< 0.012000 69 -0.0018309683
---
> 0.012000 69 -0.0018309682
fp-model strict
: only one error left.
+ ERROR in test isdb/rt-emmi/
+ check file isdb/rt-emmi/report.txt for more information
+++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Final report:
+ 279 tests performed, 128 tests not applicable
+ 1 errors found
all are last digit until it ends with
1813,1815c1813,1815
< 0.000000 1811 374.7142 374.7253
< 0.000000 1812 433.4478 433.4106
< 0.000000 1813 1873.6269 1873.5840
---
> 0.000000 1811 374.7142 374.7192
> 0.000000 1812 433.4478 433.3984
> 0.000000 1813 1873.6269 1873.5901
plumed 2.4.1 installs:
Tell you what, the results are consistent across machines, even from the old Legion node.
When it comes to GROMACS, 2018.1 was released March 21 and there isn't a patch for it in a release of PLUMED yet (there is in github master, alongside the patch for 2016.5).
It may make sense to wait until after Easter for this one, and either patch 2018.1 or if there is no new plumed release, then patch 2016.4.
No new plumed, so building gromacs 2016.4 patched with plumed 2.4.1 (containing hrex).
module unload compilers mpi
module load compilers/intel/2017/update4
module load mpi/intel/2017/update3/intel
module load libmatheval
module load flex
module load openblas/0.2.14/intel-2015-update2
module load plumed/2.4.1/intel-2017-update4
module load gromacs/2016.4/plumed/intel-2017
Running gmx_mpi_d mdrun -h
shows at the bottom:
Other options:
-deffnm <string>
Set the default filename for all file options
...
-[no]hrex (no)
Enable hamiltonian replica exchange
Informed IN:02912232 and 02766079.
A user has requested an install of the new version of PLUMED (2.4.0) and a new version of GROMACS (2016.4) on Legion, Grace, Thomas.