Open mbanck opened 3 years ago
Some observations:
./configure && make && make check
in a minimal Debian unstable i586 chroot passes the tests, so it does not seem to be a general toolchain issue../test 0 2
test) seems to pass if --with-opt-am
is lowered to 0 or 1, however, the deriv 1
tests (./test 1 1
) already fail the s
test, so this is independent of possibly lowering --with-eri-opt-am
et al.:
Testing (ss|ss) deriv order = 1: Elem 0 di= 6 v=0 : ref = -14.7577 libint = -185.892 relabs_error = 11.5962
when I try to reproduce the above libint-cp2k tarball, I get some possibly relevant diffs even if I use the same configure flags for the compiler (and the tests still fail), it might be due to the environment (they use alpine/musl, not Debian GNU/Linux), like:
--- ../libint-v2.6.0-cp2k-lmax-4/src/CR_DerivGaussP0InBra_aB_d001__0__s__1___TwoPRep_s__0__s__1___Ab__up_0.cc 2019-08-05 13:36:39.000000000 +0000
+++ libint-2.6.0/src/CR_DerivGaussP0InBra_aB_d001__0__s__1___TwoPRep_s__0__s__1___Ab__up_0.cc 2021-01-24 16:17:49.000000000 +0000
@@ -33,42 +33,40 @@
{
const int vi = 0;
LIBINT2_REALTYPE fp1;
-fp1 = 2.0000000000000000e+00 * src1[((hsi*3+2)*1+lsi)*1];
-LIBINT2_REALTYPE fp2;
-fp2 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
+fp1 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
LIBINT2_REALTYPE fp0;
-fp0 = fp2 - fp1;
+fp0 = fp1 - src1[((hsi*3+2)*1+lsi)*1];
target[((hsi*6+5)*1+lsi)*1] = fp0;
+LIBINT2_REALTYPE fp3;
+fp3 = 1.0000000000000000e+00 * src1[((hsi*3+1)*1+lsi)*1];
LIBINT2_REALTYPE fp4;
-fp4 = 1.0000000000000000e+00 * src1[((hsi*3+1)*1+lsi)*1];
+fp4 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+8)*1+lsi)*1];
+LIBINT2_REALTYPE fp2;
+fp2 = fp4 - fp3;
+target[((hsi*6+4)*1+lsi)*1] = fp2;
LIBINT2_REALTYPE fp5;
-fp5 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+8)*1+lsi)*1];
-LIBINT2_REALTYPE fp3;
-fp3 = fp5 - fp4;
-target[((hsi*6+4)*1+lsi)*1] = fp3;
-LIBINT2_REALTYPE fp6;
-fp6 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+7)*1+lsi)*1];
-target[((hsi*6+3)*1+lsi)*1] = fp6;
+fp5 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+7)*1+lsi)*1];
+target[((hsi*6+3)*1+lsi)*1] = fp5;
+LIBINT2_REALTYPE fp7;
+fp7 = 1.0000000000000000e+00 * src1[((hsi*3+0)*1+lsi)*1];
LIBINT2_REALTYPE fp8;
-fp8 = 1.0000000000000000e+00 * src1[((hsi*3+0)*1+lsi)*1];
+fp8 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+5)*1+lsi)*1];
+LIBINT2_REALTYPE fp6;
+fp6 = fp8 - fp7;
+target[((hsi*6+2)*1+lsi)*1] = fp6;
LIBINT2_REALTYPE fp9;
-fp9 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+5)*1+lsi)*1];
-LIBINT2_REALTYPE fp7;
-fp7 = fp9 - fp8;
-target[((hsi*6+2)*1+lsi)*1] = fp7;
+fp9 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+4)*1+lsi)*1];
+target[((hsi*6+1)*1+lsi)*1] = fp9;
LIBINT2_REALTYPE fp10;
-fp10 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+4)*1+lsi)*1];
-target[((hsi*6+1)*1+lsi)*1] = fp10;
-LIBINT2_REALTYPE fp11;
-fp11 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+2)*1+lsi)*1];
-target[((hsi*6+0)*1+lsi)*1] = fp11;
+fp10 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+2)*1+lsi)*1];
+target[((hsi*6+0)*1+lsi)*1] = fp10;
}
}
}
const int hsi = 0;
const int lsi = 0;
const int vi = 0;
-/** Number of flops = 12 */
+/** Number of flops = 11 */
}
#ifdef __cplusplus
The full diff of lmax=4 is here: https://people.debian.org/~mbanck/libint2.diff.gz
The interesting bit is that the failing values appear to be correct; they just have the wrong sign!
The interesting bit is that the failing values appear to be correct; they just have the wrong sign!
Only for some of the failures, but it could be that the others are just multiple sign-flips adding up
1. when I try to reproduce the above libint-cp2k tarball, I get some possibly relevant diffs even if I use the same configure flags for the compiler (and the tests still fail), it might be due to the environment (they use alpine/musl, not Debian GNU/Linux), like:
Not sure why I didn't try this earlier, but I get the same diff (or rather, no diff) if I build the libint compiler under x86-64. So something at 32bit leads to the different code generation and subsequently the test suite failures. So it is again not an environment/toolchain issue.
Only 478 files out of almost 4000 at lmax=4
are generated differently, by the way, they like those
CR_DerivGaussP[01]InBra_aB_[...]
CR_aB_[XYZ][01234]_0__Overlap_[..]
CR_aB_[spdf]__0__Kinetic_[..]
OSVRRElecPotIn{Bra,Ket}_[..]
OSVRRP[01]InBra_aB_[...]
OSVRRSMultipole_aB_[...]
Using the x86-64 generated but 32bit built libint makes the CP2K libint-related regtests pass
There is exactly one semantic difference in the diff above:
fp2 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
-target[((hsi*6+5)*1+lsi)*1] = fp2 - 2.0e+0 * src1[((hsi*3+2)*1+lsi)*1];
+target[((hsi*6+5)*1+lsi)*1] = fp2 - 1.0e+0 * src1[((hsi*3+2)*1+lsi)*1];
So as I mentioned elsewhere, I see the test suite failing on most 32bit architectures (but not alpha or powerpc; those have 16 byte
long double
as opposed to 8 or 12 for the rest, which is the only apparent difference I could spot).I made the eri tests output even without a failure in order to compare the output between x86-64 and x86-32, and even if for 32bit the reference and libint numbers match, they are way off the corresponding 64bit values. Should those numbers be in principle identical or are they machine-dependent? It could be due to a wrong fix in order to make the eri test build (see #194), but the other tests (including the Hartree-Fock tests) fail as well, see e.g. https://buildd.debian.org/status/fetch.php?pkg=libint2&arch=i386&ver=2.6.0-4&stamp=1602502875&raw=0
One example:
32bit:
64bit:
The three 32bit failures have the wrong sign:
For other parts of the test (probably more operations are done there), the sign is correct but the values are off.
32bit:
64bit: