evaleev / libint

Libint: high-performance library for computing Gaussian integrals in quantum mechanics
Other
218 stars 97 forks source link

test suite failures on i686/armhf #196

Open mbanck opened 3 years ago

mbanck commented 3 years ago

So as I mentioned elsewhere, I see the test suite failing on most 32bit architectures (but not alpha or powerpc; those have 16 byte long double as opposed to 8 or 12 for the rest, which is the only apparent difference I could spot).

I made the eri tests output even without a failure in order to compare the output between x86-64 and x86-32, and even if for 32bit the reference and libint numbers match, they are way off the corresponding 64bit values. Should those numbers be in principle identical or are they machine-dependent? It could be due to a wrong fix in order to make the eri test build (see #194), but the other tests (including the Hartree-Fock tests) fail as well, see e.g. https://buildd.debian.org/status/fetch.php?pkg=libint2&arch=i386&ver=2.6.0-4&stamp=1602502875&raw=0

One example:

32bit:

Testing (s|s)  deriv order = 1
Elem 0 di= 0 v=0 : ref = -130.531 libint = -130.531 relabs_error = 0
Elem 0 di= 1 v=0 : ref = -103.863 libint = -103.863 relabs_error = 2.73647e-16
Elem 0 di= 2 v=0 : ref = -77.9148 libint = -77.9148 relabs_error = 3.64779e-16
Elem 0 di= 3 v=0 : ref = 130.531 libint = -130.531 relabs_error = 2
Elem 0 di= 4 v=0 : ref = 103.863 libint = -103.863 relabs_error = 2
Elem 0 di= 5 v=0 : ref = 77.9148 libint = -77.9148 relabs_error = 2
failed
Testing (s|p)  deriv order = 1
Elem 0 di= 0 v=0 : ref = 36.4571 libint = 36.4571 relabs_error = 0
Elem 0 di= 1 v=0 : ref = -12.9406 libint = -12.9406 relabs_error = 0
Elem 0 di= 2 v=0 : ref = 2.53014 libint = 2.53014 relabs_error = 0
Elem 0 di= 3 v=0 : ref = -36.4571 libint = -36.4571 relabs_error = 1.55919e-15
Elem 0 di= 4 v=0 : ref = 12.9406 libint = 12.9406 relabs_error = 0
Elem 0 di= 5 v=0 : ref = -2.53014 libint = -2.53014 relabs_error = 1.7552e-16
Elem 1 di= 0 v=0 : ref = -12.9406 libint = -12.9406 relabs_error = 2.74541e-16
Elem 1 di= 1 v=0 : ref = -14.3173 libint = -14.3173 relabs_error = 6.20351e-16
Elem 1 di= 2 v=0 : ref = 10.5351 libint = 10.5351 relabs_error = 1.68614e-16
Elem 1 di= 3 v=0 : ref = 12.9406 libint = 12.9406 relabs_error = 2.74541e-16
Elem 1 di= 4 v=0 : ref = 14.3173 libint = 14.3173 relabs_error = 3.10176e-15
Elem 1 di= 5 v=0 : ref = -10.5351 libint = -10.5351 relabs_error = 0
Elem 2 di= 0 v=0 : ref = 2.53014 libint = 2.53014 relabs_error = 3.51039e-16
Elem 2 di= 1 v=0 : ref = 10.5351 libint = 10.5351 relabs_error = 3.37227e-16
Elem 2 di= 2 v=0 : ref = 37.5052 libint = 37.5052 relabs_error = 1.89452e-16
Elem 2 di= 3 v=0 : ref = -2.53014 libint = -2.53014 relabs_error = 1.7552e-16
Elem 2 di= 4 v=0 : ref = -10.5351 libint = -10.5351 relabs_error = 1.68614e-16
Elem 2 di= 5 v=0 : ref = -37.5052 libint = -37.5052 relabs_error = 9.4726e-16
ok

64bit:

Testing (s|s)  deriv order = 1
Elem 0 di= 0 v=0 : ref = -173.336 libint = -173.336 relabs_error = 1.63969e-16
Elem 0 di= 1 v=0 : ref = -81.8765 libint = -81.8765 relabs_error = 3.47129e-16
Elem 0 di= 2 v=0 : ref = 55.7602 libint = 55.7602 relabs_error = 5.09713e-16
Elem 0 di= 3 v=0 : ref = 173.336 libint = 173.336 relabs_error = 1.63969e-16
Elem 0 di= 4 v=0 : ref = 81.8765 libint = 81.8765 relabs_error = 1.73565e-16
Elem 0 di= 5 v=0 : ref = -55.7602 libint = -55.7602 relabs_error = 5.09713e-16
ok
Testing (s|p)  deriv order = 1
Elem 0 di= 0 v=0 : ref = 6.59281 libint = 6.59281 relabs_error = 1.34719e-16
Elem 0 di= 1 v=0 : ref = 0.190251 libint = 0.190251 relabs_error = 2.91778e-15
Elem 0 di= 2 v=0 : ref = -1.23046 libint = -1.23046 relabs_error = 2.16549e-15
Elem 0 di= 3 v=0 : ref = -6.59281 libint = -6.59281 relabs_error = 8.08315e-16
Elem 0 di= 4 v=0 : ref = -0.190251 libint = -0.190251 relabs_error = 2.91778e-15
Elem 0 di= 5 v=0 : ref = 1.23046 libint = 1.23046 relabs_error = 2.16549e-15
Elem 1 di= 0 v=0 : ref = 0.190251 libint = 0.190251 relabs_error = 9.19102e-15
Elem 1 di= 1 v=0 : ref = 6.33288 libint = 6.33288 relabs_error = 1.40249e-16
Elem 1 di= 2 v=0 : ref = 2.33071 libint = 2.33071 relabs_error = 5.71614e-16
Elem 1 di= 3 v=0 : ref = -0.190251 libint = -0.190251 relabs_error = 7.29446e-16
Elem 1 di= 4 v=0 : ref = -6.33288 libint = -6.33288 relabs_error = 1.40249e-15
Elem 1 di= 5 v=0 : ref = -2.33071 libint = -2.33071 relabs_error = 3.81076e-16
Elem 2 di= 0 v=0 : ref = -1.23046 libint = -1.23046 relabs_error = 9.74469e-15
Elem 2 di= 1 v=0 : ref = 2.33071 libint = 2.33071 relabs_error = 7.62151e-16
Elem 2 di= 2 v=0 : ref = -8.38073 libint = -8.38073 relabs_error = 2.11957e-16
Elem 2 di= 3 v=0 : ref = 1.23046 libint = 1.23046 relabs_error = 1.80457e-16
Elem 2 di= 4 v=0 : ref = -2.33071 libint = -2.33071 relabs_error = 9.52689e-16
Elem 2 di= 5 v=0 : ref = 8.38073 libint = 8.38073 relabs_error = 2.11957e-16
ok

The three 32bit failures have the wrong sign:

Elem 0 di= 3 v=0 : ref = 130.531 libint = -130.531 relabs_error = 2
Elem 0 di= 4 v=0 : ref = 103.863 libint = -103.863 relabs_error = 2
Elem 0 di= 5 v=0 : ref = 77.9148 libint = -77.9148 relabs_error = 2

For other parts of the test (probably more operations are done there), the sign is correct but the values are off.

32bit:

Testing (d|p) 
Elem 0 di= 0 v=0 : ref = 92.7058 libint = 112.767 relabs_error = 0.216392
Elem 1 di= 0 v=0 : ref = -227.271 libint = -227.271 relabs_error = 1.25056e-16
Elem 2 di= 0 v=0 : ref = 288.784 libint = 288.784 relabs_error = 1.96837e-16
Elem 3 di= 0 v=0 : ref = 31.1654 libint = 31.1654 relabs_error = 2.27991e-16
Elem 4 di= 0 v=0 : ref = -14.6554 libint = -14.6554 relabs_error = 0
Elem 5 di= 0 v=0 : ref = -6.86838 libint = -6.86838 relabs_error = 2.58628e-16
Elem 6 di= 0 v=0 : ref = -39.6006 libint = -39.6006 relabs_error = 3.58855e-16
Elem 7 di= 0 v=0 : ref = -6.86838 libint = -6.86838 relabs_error = 3.87942e-16
Elem 8 di= 0 v=0 : ref = -11.3334 libint = -11.3334 relabs_error = 1.56737e-16
Elem 9 di= 0 v=0 : ref = 136.386 libint = 136.386 relabs_error = 0
Elem 10 di= 0 v=0 : ref = -164.712 libint = -199.036 relabs_error = 0.208391
Elem 11 di= 0 v=0 : ref = 296.522 libint = 296.522 relabs_error = 0
Elem 12 di= 0 v=0 : ref = -6.86838 libint = -6.86838 relabs_error = 0
Elem 13 di= 0 v=0 : ref = -31.8628 libint = -31.8628 relabs_error = 0
Elem 14 di= 0 v=0 : ref = 19.3917 libint = 19.3917 relabs_error = 0
Elem 15 di= 0 v=0 : ref = 139.708 libint = 139.708 relabs_error = 0
Elem 16 di= 0 v=0 : ref = -239.045 libint = -239.045 relabs_error = 2.37794e-16
Elem 17 di= 0 v=0 : ref = 216.515 libint = 260.13 relabs_error = 0.20144
failed

64bit:

Testing (d|p) 
Elem 0 di= 0 v=0 : ref = -0.708763 libint = -0.708763 relabs_error = 9.39854e-16
Elem 1 di= 0 v=0 : ref = 3.88409 libint = 3.88409 relabs_error = 6.86013e-16
Elem 2 di= 0 v=0 : ref = 2.94575 libint = 2.94575 relabs_error = 1.05529e-15
Elem 3 di= 0 v=0 : ref = -0.851277 libint = -0.851277 relabs_error = 2.60837e-16
Elem 4 di= 0 v=0 : ref = -2.51181 libint = -2.51181 relabs_error = 1.2376e-15
Elem 5 di= 0 v=0 : ref = 0.645867 libint = 0.645867 relabs_error = 1.20328e-15
Elem 6 di= 0 v=0 : ref = -0.645621 libint = -0.645621 relabs_error = 3.43924e-16
Elem 7 di= 0 v=0 : ref = 0.645867 libint = 0.645867 relabs_error = 1.54707e-15
Elem 8 di= 0 v=0 : ref = -2.87358 libint = -2.87358 relabs_error = 9.27252e-16
Elem 9 di= 0 v=0 : ref = 4.82524 libint = 4.82524 relabs_error = 1.84069e-16
Elem 10 di= 0 v=0 : ref = -1.2273 libint = -1.2273 relabs_error = 1.99014e-15
Elem 11 di= 0 v=0 : ref = 2.36188 libint = 2.36188 relabs_error = 1.31617e-15
Elem 12 di= 0 v=0 : ref = 0.645867 libint = 0.645867 relabs_error = 5.1569e-16
Elem 13 di= 0 v=0 : ref = -1.22949 libint = -1.22949 relabs_error = 0
Elem 14 di= 0 v=0 : ref = -1.85462 libint = -1.85462 relabs_error = 7.1835e-16
Elem 15 di= 0 v=0 : ref = 4.46347 libint = 4.46347 relabs_error = 0
Elem 16 di= 0 v=0 : ref = 2.88074 libint = 2.88074 relabs_error = 1.07911e-15
Elem 17 di= 0 v=0 : ref = -1.10788 libint = -1.10788 relabs_error = 6.0127e-16
ok
mbanck commented 3 years ago

Some observations:

  1. Downloading the generated libint-cp2k tarball (https://github.com/cp2k/libint-cp2k/releases/download/v2.6.0/libint-v2.6.0-cp2k-lmax-4.tgz) and just running ./configure && make && make check in a minimal Debian unstable i586 chroot passes the tests, so it does not seem to be a general toolchain issue.
  2. the non-deriv eri test (the first ./test 0 2 test) seems to pass if --with-opt-am is lowered to 0 or 1, however, the deriv 1 tests (./test 1 1) already fail the s test, so this is independent of possibly lowering --with-eri-opt-am et al.:
    Testing  (ss|ss)  deriv order = 1: Elem 0 di= 6 v=0 : ref = -14.7577 libint = -185.892 relabs_error = 11.5962
  3. when I try to reproduce the above libint-cp2k tarball, I get some possibly relevant diffs even if I use the same configure flags for the compiler (and the tests still fail), it might be due to the environment (they use alpine/musl, not Debian GNU/Linux), like:

    --- ../libint-v2.6.0-cp2k-lmax-4/src/CR_DerivGaussP0InBra_aB_d001__0__s__1___TwoPRep_s__0__s__1___Ab__up_0.cc   2019-08-05 13:36:39.000000000 +0000
    +++ libint-2.6.0/src/CR_DerivGaussP0InBra_aB_d001__0__s__1___TwoPRep_s__0__s__1___Ab__up_0.cc   2021-01-24 16:17:49.000000000 +0000
    @@ -33,42 +33,40 @@
    {
    const int vi = 0;
    LIBINT2_REALTYPE fp1;
    -fp1 = 2.0000000000000000e+00 * src1[((hsi*3+2)*1+lsi)*1];
    -LIBINT2_REALTYPE fp2;
    -fp2 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
    +fp1 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
    LIBINT2_REALTYPE fp0;
    -fp0 = fp2 - fp1;
    +fp0 = fp1 - src1[((hsi*3+2)*1+lsi)*1];
    target[((hsi*6+5)*1+lsi)*1] = fp0;
    +LIBINT2_REALTYPE fp3;
    +fp3 = 1.0000000000000000e+00 * src1[((hsi*3+1)*1+lsi)*1];
    LIBINT2_REALTYPE fp4;
    -fp4 = 1.0000000000000000e+00 * src1[((hsi*3+1)*1+lsi)*1];
    +fp4 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+8)*1+lsi)*1];
    +LIBINT2_REALTYPE fp2;
    +fp2 = fp4 - fp3;
    +target[((hsi*6+4)*1+lsi)*1] = fp2;
    LIBINT2_REALTYPE fp5;
    -fp5 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+8)*1+lsi)*1];
    -LIBINT2_REALTYPE fp3;
    -fp3 = fp5 - fp4;
    -target[((hsi*6+4)*1+lsi)*1] = fp3;
    -LIBINT2_REALTYPE fp6;
    -fp6 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+7)*1+lsi)*1];
    -target[((hsi*6+3)*1+lsi)*1] = fp6;
    +fp5 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+7)*1+lsi)*1];
    +target[((hsi*6+3)*1+lsi)*1] = fp5;
    +LIBINT2_REALTYPE fp7;
    +fp7 = 1.0000000000000000e+00 * src1[((hsi*3+0)*1+lsi)*1];
    LIBINT2_REALTYPE fp8;
    -fp8 = 1.0000000000000000e+00 * src1[((hsi*3+0)*1+lsi)*1];
    +fp8 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+5)*1+lsi)*1];
    +LIBINT2_REALTYPE fp6;
    +fp6 = fp8 - fp7;
    +target[((hsi*6+2)*1+lsi)*1] = fp6;
    LIBINT2_REALTYPE fp9;
    -fp9 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+5)*1+lsi)*1];
    -LIBINT2_REALTYPE fp7;
    -fp7 = fp9 - fp8;
    -target[((hsi*6+2)*1+lsi)*1] = fp7;
    +fp9 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+4)*1+lsi)*1];
    +target[((hsi*6+1)*1+lsi)*1] = fp9;
    LIBINT2_REALTYPE fp10;
    -fp10 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+4)*1+lsi)*1];
    -target[((hsi*6+1)*1+lsi)*1] = fp10;
    -LIBINT2_REALTYPE fp11;
    -fp11 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+2)*1+lsi)*1];
    -target[((hsi*6+0)*1+lsi)*1] = fp11;
    +fp10 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+2)*1+lsi)*1];
    +target[((hsi*6+0)*1+lsi)*1] = fp10;
    }
    }
    }
    const int hsi = 0;
    const int lsi = 0;
    const int vi = 0;
    -/** Number of flops = 12 */
    +/** Number of flops = 11 */
    }
    
    #ifdef __cplusplus

    The full diff of lmax=4 is here: https://people.debian.org/~mbanck/libint2.diff.gz

  4. Once I apply that diff and build, the eri test suite passes.
susilehtola commented 3 years ago

The interesting bit is that the failing values appear to be correct; they just have the wrong sign!

mbanck commented 3 years ago

The interesting bit is that the failing values appear to be correct; they just have the wrong sign!

Only for some of the failures, but it could be that the others are just multiple sign-flips adding up

mbanck commented 3 years ago
1. when I try to reproduce the above libint-cp2k tarball, I get some possibly relevant diffs even if I use the same configure flags for the compiler (and the tests still fail), it might be due to the environment (they use alpine/musl, not Debian GNU/Linux), like:

Not sure why I didn't try this earlier, but I get the same diff (or rather, no diff) if I build the libint compiler under x86-64. So something at 32bit leads to the different code generation and subsequently the test suite failures. So it is again not an environment/toolchain issue.

mbanck commented 3 years ago

Only 478 files out of almost 4000 at lmax=4 are generated differently, by the way, they like those

CR_DerivGaussP[01]InBra_aB_[...]
CR_aB_[XYZ][01234]_0__Overlap_[..]
CR_aB_[spdf]__0__Kinetic_[..]
OSVRRElecPotIn{Bra,Ket}_[..]
OSVRRP[01]InBra_aB_[...]
OSVRRSMultipole_aB_[...]
mbanck commented 3 years ago

Using the x86-64 generated but 32bit built libint makes the CP2K libint-related regtests pass

StefanBruens commented 1 year ago

There is exactly one semantic difference in the diff above:

 fp2 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
-target[((hsi*6+5)*1+lsi)*1] = fp2 - 2.0e+0 * src1[((hsi*3+2)*1+lsi)*1];
+target[((hsi*6+5)*1+lsi)*1] = fp2 - 1.0e+0 * src1[((hsi*3+2)*1+lsi)*1];