UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

Install Request: Intel builds of LAMMPS 29Sep2021 update2 #476

Open heatherkellyucl opened 2 years ago

heatherkellyucl commented 2 years ago

Follow-on from #468

Do the Intel-optimised builds, inc GPU.

heatherkellyucl commented 2 years ago

Packages not included in https://github.com/lammps/lammps/blob/develop/cmake/presets/most.cmake which we built last time: mpiio "has become unreliable" https://github.com/lammps/lammps/issues/3066 snap - renamed to ML-SNAP (included) user-reaxc - renamed to REAXFF (included) user-meamc - renamed to MEAM (included)

Needed external stuff, it should now do this automatically: meam (included) poems (included) reax - think it was removed in 2019 voronoi (included)

heatherkellyucl commented 2 years ago

Test build has begun.

heatherkellyucl commented 2 years ago

Eigen build error:

[ 18%] Building CXX object CMakeFiles/lammps.dir/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp.o
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/MathFunctions.h(327): error: no instance of overloaded function "sqrt" matches the argument list
            argument types are: (const Eigen::internal::Packet8d)
      return sqrt(x);
             ^
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/MathFunctions.h(326): note: this candidate was rejected because arguments do not match
      EIGEN_USING_STD(sqrt);
      ^
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/MathFunctions.h(326): note: this candidate was rejected because arguments do not match
      EIGEN_USING_STD(sqrt);
      ^
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/MathFunctions.h(326): note: this candidate was rejected because arguments do not match
      EIGEN_USING_STD(sqrt);
      ^
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/MathFunctions.h(326): note: this candidate was rejected because at least one template argument could not be deduced
      EIGEN_USING_STD(sqrt);
      ^
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/MathFunctions.h(326): note: this candidate was rejected because at least one template argument could not be deduced
      EIGEN_USING_STD(sqrt);
      ^
/usr/include/bits/mathcalls.h(157): note: this candidate was rejected because arguments do not match
  __MATHCALL (sqrt,, (_Mdouble_ __x));
  ^
          detected during:
            instantiation of "Scalar Eigen::internal::sqrt_impl<Scalar>::run(const Scalar &) [with Scalar=Eigen::internal::Packet8d]" at line 1467
            instantiation of "Eigen::internal::sqrt_retval<Eigen::internal::global_math_functions_filtering_base<Scalar, void>::type>::type Eigen::numext::sqrt(const Scalar &) [with Scalar=Eigen::internal::Packet8d]" at line 815 of "/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/GenericPacketMath.h"
            instantiation of "Packet Eigen::internal::psqrt(const Packet &) [with Packet=Eigen::internal::Packet8d]" at line 813 of "/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h"
            instantiation of "Packet Eigen::internal::psqrt_complex(const Packet &) [with Packet=Eigen::internal::Packet4cd]" at line 412 of "/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build/Eigen3_build-prefix/src/Eigen3_build/Eigen/src/Core/arch/AVX512/Complex.h"

Is a C++ templating thing. (It is trying to build Eigen 3.4.0).

heatherkellyucl commented 2 years ago

The thing to try next is building Eigen standalone and seeing if that works.

heatherkellyucl commented 2 years ago

I have a standalone Eigen install and am trying make check to see what happens there.

heatherkellyucl commented 2 years ago

With the Intel + gcc-libs/4.9.3 combo, confirm the standalone also has test failures towards the end for templating reasons. Seeing what the tests do with the gnu compiler and no Intel.

/home/cceahke/eigen/eigen-3.4.0/unsupported/test/../Eigen/src/SpecialFunctions/SpecialFunctionsImpl.h(786): error: no instance of function template "Eigen::internal::main_igamma_term" matches the argument list
            argument types are: (float, float)
      Scalar ax = main_igamma_term<Scalar>(a, x);
                  ^
/home/cceahke/eigen/eigen-3.4.0/unsupported/test/../Eigen/src/SpecialFunctions/SpecialFunctionsImpl.h(733): note: this candidate was rejected because function is not visible
  static EIGEN_STRONG_INLINE Scalar main_igamma_term(Scalar a, Scalar x) {
                                    ^
          detected during:
            instantiation of "Scalar Eigen::internal::igammac_cf_impl<Scalar, mode>::run(Scalar, Scalar) [with Scalar=float, mode=Eigen::internal::VALUE]" at line 1091
            instantiation of "Scalar Eigen::internal::igamma_generic_impl<Scalar, mode>::run(Scalar, Scalar) [with Scalar=float, mode=Eigen::internal::VALUE]" at line 2015
            instantiation of "Eigen::internal::igamma_retval<Eigen::internal::global_math_functions_filtering_base<Scalar, void>::type>::type Eigen::numext::igamma(const Scalar &, const Scalar &) [with Scalar=float]" at line 37 of "/home/cceahke/eigen/eigen-3.4.0/unsupported/test/../Eigen/src/SpecialFunctions/SpecialFunctionsBFloat16.h"

Also trying lammps install with an already-available eigen testmodule, ~/testmodules/eigen-3.4.0 so it doesn't try to do one itself - question is whether it is just the syntax of eigen's tests that are the issue with Intel (order of header imports is important), or if lammps itself will also have a problem.

heatherkellyucl commented 2 years ago

Ok, same lammps build issue when using my already-built eigen.

We can try using a newer gcc-libs to go with this Intel compiler.

heatherkellyucl commented 2 years ago

No, we're back at a fail

[ 19%] Building CXX object CMakeFiles/lammps.dir/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp.o
/shared/ucl/apps/intel/2019.Update5/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc -DFFT_KISS -DLAMMPS_EXCEPTIONS -DLAMMPS_GZIP -DLAMMPS_JPEG -DLAMMPS_MEM
ALIGN=64 -DLAMMPS_OMP_COMPAT=4 -DLAMMPS_PNG -DLAMMPS_SMALLBIG -DLMP_OPENMP -DLMP_PLUGIN -DLMP_PYTHON -DMLIAP_PYTHON -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX -D_MPICC_H
 -Dlammps_EXPORTS -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build3/cython -I/home/cceahke/lam
mps/lammps-stable_29Sep2021_update2/src/ASPHERE -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/BOCS -I/home/cceahke/lammps/lammps-stable_29Sep2021_update
2/src/BODY -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/BROWNIAN -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/CG-DNA -I/home/cceahke/lamm
ps/lammps-stable_29Sep2021_update2/src/CG-SDK -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/CLASS2 -I/home/cceahke/lammps/lammps-stable_29Sep2021_update
2/src/COLLOID -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/COLVARS -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/COMPRESS -I/home/cceahke/
lammps/lammps-stable_29Sep2021_update2/src/DIELECTRIC -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/DIFFRACTION -I/home/cceahke/lammps/lammps-stable_29S
ep2021_update2/src/DIPOLE -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/DPD-BASIC -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/DPD-MESO -I
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/DPD-REACT -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/DPD-SMOOTH -I/home/cceahke/lammps/lammp
s-stable_29Sep2021_update2/src/DRUDE -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/EFF -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/EXTRA-
COMPUTE -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/EXTRA-DUMP -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/EXTRA-FIX -I/home/cceahke/la
mmps/lammps-stable_29Sep2021_update2/src/EXTRA-MOLECULE -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/EXTRA-PAIR -I/home/cceahke/lammps/lammps-stable_29
Sep2021_update2/src/FEP -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/GRANULAR -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/INTERLAYER -I/
home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/KSPACE -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MACHDYN -I/home/cceahke/lammps/lammps-stabl
e_29Sep2021_update2/src/MANYBODY -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MC -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MEAM -I/hom
e/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MISC -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/ML-IAP -I/home/cceahke/lammps/lammps-stable_29Se
p2021_update2/src/ML-SNAP -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MOFFF -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MOLECULE -I/hom
e/cceahke/lammps/lammps-stable_29Sep2021_update2/src/ORIENT -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/PERI -I/home/cceahke/lammps/lammps-stable_29Se
p2021_update2/src/PHONON -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/PLUGIN -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/POEMS -I/home/c
ceahke/lammps/lammps-stable_29Sep2021_update2/src/PYTHON -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/QEQ -I/home/cceahke/lammps/lammps-stable_29Sep202
1_update2/src/REACTION -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/REAXFF -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/REPLICA -I/home/c
ceahke/lammps/lammps-stable_29Sep2021_update2/src/RIGID -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/SHOCK -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/SPH -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/SPIN -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/SRD -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/TALLY -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/UEF -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/VORONOI -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/YAFF -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/CORESHELL -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/OPENMP -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/OPT -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/build3/styles -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/lib/colvars -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/lib/colvars/lepton/include -I/home/cceahke/lammps/lammps-stable_29Sep2021_update2/lib/poems -isystem /lustre/shared/ucl/apps/python/3.9.6/gnu-10.2.0/include/python3.9 -isystem /home/cceahke/lammps/lammps-stable_29Sep2021_update2/build3/voro_build-prefix/src/voro_build/src -isystem /home/cceahke/eigen/eigen-3.4.0 -restrict -O2 -g -DNDEBUG -fPIC -xHost -qopenmp -std=c++11 -MD -MT CMakeFiles/lammps.dir/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp.o -MF CMakeFiles/lammps.dir/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp.o.d -o CMakeFiles/lammps.dir/home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp.o -c /home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp

In file included from /lustre/shared/ucl/apps/intel/2019.Update5/compilers_and_libraries_2019.5.281/linux/compiler/include/atomic(326),
                 from /home/cceahke/eigen/eigen-3.4.0/Eigen/src/Core/products/Parallelizer.h(14),
                 from /home/cceahke/eigen/eigen-3.4.0/Eigen/Core(331),
                 from /home/cceahke/eigen/eigen-3.4.0/Eigen/Dense(1),
                 from /home/cceahke/eigen/eigen-3.4.0/Eigen/Eigen(1),
                 from /home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/MACHDYN/pair_smd_tlsph.h(35),
                 from /home/cceahke/lammps/lammps-stable_29Sep2021_update2/build3/styles/style_pair.h(303),
                 from /home/cceahke/lammps/lammps-stable_29Sep2021_update2/src/force.cpp(22):
/lustre/shared/ucl/apps/intel/2019.Update5/compilers_and_libraries_2019.5.281/linux/compiler/include/stdatomic.h(168): error: invalid redefinition of enum "std::memory_order" (declared at line 74 of "/lustre/shared/ucl/apps/gcc/10.2.0-p95889/bin/../include/c++/10.2.0/bits/atomic_base.h")
  typedef enum memory_order {
               ^
heatherkellyucl commented 2 years ago

Going to do a quick try with Intel 2020 - otherwise, EasyBuild uses Eigen 3.3.7 rather than 3.4.0 with LAMMPS and Intel 2020, so if it still fails try that.

(They also have notes that USER-INTEL produces the wrong results for Intel 2019 and have a test for that, so Intel 2020 sounds like the better way to go, see https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/l/LAMMPS/LAMMPS-3Mar2020-intel-2020a-Python-3.8.2-kokkos.eb).

heatherkellyucl commented 2 years ago

Still fails - different Eigen time.

heatherkellyucl commented 2 years ago

LAMMPS test build succeeded with Eigen 3.3.9! Do a central install of that, use the module.

heatherkellyucl commented 2 years ago

LAMMPS installs, basic:

heatherkellyucl commented 2 years ago

LAMMPS installs, INTEL

LAMMPS installs, GPU

heatherkellyucl commented 2 years ago

INTEL test build is going.

heatherkellyucl commented 2 years ago

The basic installs and INTEL test build got interrupted on Friday evening (couldn't reach download.lammps.org). Rerunning.

heatherkellyucl commented 2 years ago

LAMMPS INTEL tests:

84% tests passed, 76 tests failed out of 482

         10 - AtomStyles (Failed)
         87 - MolPairStyle:coul_diel (Failed)
         95 - MolPairStyle:coul_shield (Failed)
        118 - MolPairStyle:lj_charmm_coul_long_soft (Failed)
        119 - MolPairStyle:lj_charmm_coul_msm (Failed)
        133 - MolPairStyle:lj_class2_soft (Failed)
        142 - MolPairStyle:lj_cut_coul_long_soft (Failed)
        150 - MolPairStyle:lj_cut_soft (Failed)
        156 - MolPairStyle:lj_expand_coul_long (Failed)
        169 - MolPairStyle:lj_sdk_coul_long (Failed)
        172 - MolPairStyle:lj_sdk_coul_table (Failed)
        176 - MolPairStyle:lj_switch3_coulgauss_long (Failed)
        199 - MolPairStyle:tip4p_long_soft (Failed)
        202 - MolPairStyle:wf_cut (Failed)
        210 - AtomicPairStyle:buck_coul_cut_qeq_point (Failed)
        211 - AtomicPairStyle:buck_coul_cut_qeq_shielded (Failed)
        228 - AtomicPairStyle:edip (Failed)
        231 - AtomicPairStyle:hybrid-eam (Failed)
        235 - AtomicPairStyle:meam (Failed)
        236 - AtomicPairStyle:meam_spline (Failed)
        237 - AtomicPairStyle:meam_sw_spline (Failed)
        240 - AtomicPairStyle:reaxff (Failed)
        241 - AtomicPairStyle:reaxff_lgvdw (Failed)
        242 - AtomicPairStyle:reaxff_noqeq (Failed)
        243 - AtomicPairStyle:reaxff_tabulate (Failed)
        253 - ManybodyPairStyle:bop (Failed)
        254 - ManybodyPairStyle:bop_save (Failed)
        255 - ManybodyPairStyle:comb (Failed)
        257 - ManybodyPairStyle:drip (Failed)
        258 - ManybodyPairStyle:drip_real (Failed)
        259 - ManybodyPairStyle:edip_multi (Failed)
        263 - ManybodyPairStyle:ilp-graphene-hbn (Failed)
        264 - ManybodyPairStyle:ilp-graphene-hbn_notaper (Failed)
        265 - ManybodyPairStyle:kolmogorov_crespi_full (Failed)
        268 - ManybodyPairStyle:lcbop (Failed)
        269 - ManybodyPairStyle:lebedeva_z (Failed)
        270 - ManybodyPairStyle:meam (Failed)
        275 - ManybodyPairStyle:mliap_so3 (Failed)
        276 - ManybodyPairStyle:nb3b_harmonic (Failed)
        279 - ManybodyPairStyle:polymorphic_sw (Failed)
        280 - ManybodyPairStyle:polymorphic_tersoff (Failed)
        293 - ManybodyPairStyle:tersoff_shift (Failed)
        294 - ManybodyPairStyle:tersoff_table (Failed)
        302 - BondStyle:gaussian (Failed)
        319 - AngleStyle:cosine_delta (Failed)
        321 - AngleStyle:cosine_shift (Failed)
        377 - FixTimestep:addforce_const (Failed)
        378 - FixTimestep:addforce_variable (Failed)
        379 - FixTimestep:addtorque_const (Failed)
        382 - FixTimestep:aveforce_variable (Failed)
        384 - FixTimestep:drag (Failed)
        388 - FixTimestep:heat (Failed)
        391 - FixTimestep:momentum (Failed)
        393 - FixTimestep:nph (Failed)
        394 - FixTimestep:nph_sphere (Failed)
        396 - FixTimestep:npt_iso (Failed)
        397 - FixTimestep:npt_sphere_aniso (Failed)
        398 - FixTimestep:npt_sphere_iso (Failed)
        399 - FixTimestep:npt_sphere_tri (Failed)
        407 - FixTimestep:nvt (Failed)
        409 - FixTimestep:oneway (Failed)
        422 - FixTimestep:rigid_npt_small (Failed)
        434 - FixTimestep:shake_angle (Failed)
        436 - FixTimestep:smd_couple (Failed)
        439 - FixTimestep:spring_couple (Failed)
        440 - FixTimestep:spring_rg (Failed)
        442 - FixTimestep:spring_tether (Failed)
        443 - FixTimestep:temp_berendsen (Failed)
        444 - FixTimestep:temp_csld (Failed)
        445 - FixTimestep:temp_csvr (Failed)
        446 - FixTimestep:temp_rescale (Failed)
        465 - DihedralStyle:table_cut_linear (Failed)
        467 - DihedralStyle:table_linear (Failed)
        468 - DihedralStyle:table_spline (Failed)
        476 - ImproperStyle:harmonic (Failed)
        478 - ImproperStyle:inversion_harmonic (Failed)
Errors while running CTest
Output from these tests are in: /home/cceahke/lammps/lammps-stable_29Sep2021_update2/build-INTEL/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

I ran using ctest -LE unstable,slow in my build dir to try and exclude tests tagged as those (known to be especially numerically unstable or take a long time to run).

heatherkellyucl commented 2 years ago

The LAMMPS unit tests aren't designed to be acceptance tests, so it is a bit tricky...

10/482 AtomStyles itself ran 15 tests and one failed (with notable differences but we don't know if they are meaningful):

/home/cceahke/lammps/lammps-stable_29Sep2021_update2/unittest/formats/test_atom_styles.cpp:2449: Failure
The difference between bonus[3].quat[0] and 0.62499650256800654 is 0.12889178907626098, which exceeds EPSILON, where
bonus[3].quat[0] evaluates to 0.49610471349174556,
0.62499650256800654 evaluates to 0.62499650256800654, and
EPSILON evaluates to 5.0000000000000002e-14.
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/unittest/formats/test_atom_styles.cpp:2450: Failure
The difference between bonus[3].quat[1] and 0.47323774316465234 is 0.060486577442735057, which exceeds EPSILON, where
bonus[3].quat[1] evaluates to 0.53372432060738739,
0.47323774316465234 evaluates to 0.47323774316465234, and
EPSILON evaluates to 5.0000000000000002e-14.
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/unittest/formats/test_atom_styles.cpp:2451: Failure
The difference between bonus[3].quat[2] and 0.33072552332373728 is 0.11056266786097774, which exceeds EPSILON, where
bonus[3].quat[2] evaluates to 0.22016285546275954,
0.33072552332373728 evaluates to 0.33072552332373728, and
EPSILON evaluates to 5.0000000000000002e-14.
/home/cceahke/lammps/lammps-stable_29Sep2021_update2/unittest/formats/test_atom_styles.cpp:2452: Failure
The difference between bonus[3].quat[3] and 0.52540083597613996 is 0.12309494652205033, which exceeds EPSILON, where
bonus[3].quat[3] evaluates to 0.64849578249819029,
0.52540083597613996 evaluates to 0.52540083597613996, and
EPSILON evaluates to 5.0000000000000002e-14.
[  FAILED  ] AtomStyleTest.body_nparticle (36 ms)
heatherkellyucl commented 2 years ago

EasyBuild's sanity check again was

# run short test case to make sure installation doesn't produce blatently incorrect results;
# this catches a problem where having the USER-INTEL package enabled causes trouble when installing with intel/2019b
# (requires an MPI context for intel/2020a)
sanity_check_commands = ["cd %(builddir)s && %(mpi_cmd_prefix)s python lammps_vs_yaff_test_single_point_energy.py"]
heatherkellyucl commented 2 years ago

Unfortunately that sanity check requires us to have a version of mpi4py for this version of python, plus yaff... (Actual test is https://raw.githubusercontent.com/easybuilders/easybuild-easyconfigs/develop/easybuild/easyconfigs/l/LAMMPS/lammps_vs_yaff_test_single_point_energy.py).

heatherkellyucl commented 2 years ago

In comparison, the Intel-but-not-using-INTEL install still has some fails but fewer:

89% tests passed, 53 tests failed out of 482

Total Test time (real) = 578.19 sec

The following tests FAILED:
         10 - AtomStyles (Failed)
         95 - MolPairStyle:coul_shield (Failed)
        118 - MolPairStyle:lj_charmm_coul_long_soft (Failed)
        119 - MolPairStyle:lj_charmm_coul_msm (Failed)
        142 - MolPairStyle:lj_cut_coul_long_soft (Failed)
        156 - MolPairStyle:lj_expand_coul_long (Failed)
        169 - MolPairStyle:lj_sdk_coul_long (Failed)
        172 - MolPairStyle:lj_sdk_coul_table (Failed)
        176 - MolPairStyle:lj_switch3_coulgauss_long (Failed)
        199 - MolPairStyle:tip4p_long_soft (Failed)
        210 - AtomicPairStyle:buck_coul_cut_qeq_point (Failed)
        211 - AtomicPairStyle:buck_coul_cut_qeq_shielded (Failed)
        228 - AtomicPairStyle:edip (Failed)
        236 - AtomicPairStyle:meam_spline (Failed)
        237 - AtomicPairStyle:meam_sw_spline (Failed)
        240 - AtomicPairStyle:reaxff (Failed)
        241 - AtomicPairStyle:reaxff_lgvdw (Failed)
        242 - AtomicPairStyle:reaxff_noqeq (Failed)
        243 - AtomicPairStyle:reaxff_tabulate (Failed)
        254 - ManybodyPairStyle:bop_save (Failed)
        257 - ManybodyPairStyle:drip (Failed)
        258 - ManybodyPairStyle:drip_real (Failed)
        259 - ManybodyPairStyle:edip_multi (Failed)
        263 - ManybodyPairStyle:ilp-graphene-hbn (Failed)
        264 - ManybodyPairStyle:ilp-graphene-hbn_notaper (Failed)
        265 - ManybodyPairStyle:kolmogorov_crespi_full (Failed)
        268 - ManybodyPairStyle:lcbop (Failed)
        269 - ManybodyPairStyle:lebedeva_z (Failed)
        270 - ManybodyPairStyle:meam (Failed)
        275 - ManybodyPairStyle:mliap_so3 (Failed)
        276 - ManybodyPairStyle:nb3b_harmonic (Failed)
        279 - ManybodyPairStyle:polymorphic_sw (Failed)
        280 - ManybodyPairStyle:polymorphic_tersoff (Failed)
        294 - ManybodyPairStyle:tersoff_table (Failed)
        302 - BondStyle:gaussian (Failed)
        379 - FixTimestep:addtorque_const (Failed)
        382 - FixTimestep:aveforce_variable (Failed)
        391 - FixTimestep:momentum (Failed)
        393 - FixTimestep:nph (Failed)
        394 - FixTimestep:nph_sphere (Failed)
        396 - FixTimestep:npt_iso (Failed)
        397 - FixTimestep:npt_sphere_aniso (Failed)
        398 - FixTimestep:npt_sphere_iso (Failed)
        399 - FixTimestep:npt_sphere_tri (Failed)
        422 - FixTimestep:rigid_npt_small (Failed)
        427 - FixTimestep:rigid_nvt (Failed)
        434 - FixTimestep:shake_angle (Failed)
        444 - FixTimestep:temp_csld (Failed)
        465 - DihedralStyle:table_cut_linear (Failed)
        467 - DihedralStyle:table_linear (Failed)
        468 - DihedralStyle:table_spline (Failed)
        476 - ImproperStyle:harmonic (Failed)
        478 - ImproperStyle:inversion_harmonic (Failed)
heatherkellyucl commented 2 years ago
diff Testing/intelfails Testing/intelINTELfails
1a2
>          87 - MolPairStyle:coul_diel (Failed)
4a6
>         133 - MolPairStyle:lj_class2_soft (Failed)
5a8
>         150 - MolPairStyle:lj_cut_soft (Failed)
10a14
>         202 - MolPairStyle:wf_cut (Failed)
13a18,19
>         231 - AtomicPairStyle:hybrid-eam (Failed)
>         235 - AtomicPairStyle:meam (Failed)
19a26
>         253 - ManybodyPairStyle:bop (Failed)
20a28
>         255 - ManybodyPairStyle:comb (Failed)
33a42
>         293 - ManybodyPairStyle:tersoff_shift (Failed)
35a45,48
>         319 - AngleStyle:cosine_delta (Failed)
>         321 - AngleStyle:cosine_shift (Failed)
>         377 - FixTimestep:addforce_const (Failed)
>         378 - FixTimestep:addforce_variable (Failed)
37a51,52
>         384 - FixTimestep:drag (Failed)
>         388 - FixTimestep:heat (Failed)
44a60,61
>         407 - FixTimestep:nvt (Failed)
>         409 - FixTimestep:oneway (Failed)
46d62
<         427 - FixTimestep:rigid_nvt (Failed)
47a64,68
>         436 - FixTimestep:smd_couple (Failed)
>         439 - FixTimestep:spring_couple (Failed)
>         440 - FixTimestep:spring_rg (Failed)
>         442 - FixTimestep:spring_tether (Failed)
>         443 - FixTimestep:temp_berendsen (Failed)
48a70,71
>         445 - FixTimestep:temp_csvr (Failed)
>         446 - FixTimestep:temp_rescale (Failed)

427 succeeded with INTEL but not without it.

heatherkellyucl commented 2 years ago

Ok, have installed a mpi4py for Intel in my home to run the sanity check.

heatherkellyucl commented 2 years ago

Less straightforward than I might have hoped!

Made a virtualenv and installed the things:

module use ~/testmodules
module load mpi4py-3.1.3-intel20 
virtualenv venv
source venv/bin/activate

pip install numpy
pip install Cython

# There is no external swap_noncovalent_lammps in the yaff in pip (1.4.x) so get source.
wget https://github.com/molmod/yaff/releases/download/1.6.0/yaff-1.6.0.tar.gz
tar -xvf yaff-1.6.0.tar.gz
cd yaff-1.6.0
# install into active venv
python setup.py install
cd ..

module load lammps-29Sep2021-INTEL-intel20 
gerun -np 8 python lammps_vs_yaff_test_single_point_energy.py

Had to add a symlink to the Intel LAMMPS builds - they only make liblammps_mpi.so, and not liblammps.so which the Python looks for. (The gnu basic build made both named libraries).

Unfortunately the lammps run fails because of the input files:

 FFINIT Force field with 6 parts: valence, pair_ei, ewald_reci, ewald_cor,
 FFINIT                           ewald_neut, pair_mm3.
 FFINIT Neighborlist present: True
Traceback (most recent call last):
  File "/lustre/home/cceahke/lammps/lammps_vs_yaff_test_single_point_energy.py", line 1327, in <module>
    main()
    ff = swap_noncovalent_lammps(ff_yaff, fn_system='lammps.dat', fn_log="log.lammps",
  File "/lustre/home/cceahke/lammps/venv/lib/python3.9/site-packages/yaff-1.6.0-py3.9-linux-x86_64.egg/yaff/external/liblammps.py", line 371, in swap_noncovalent_lammps
    part_lammps = ForcePartLammps(ff, fn_system, **kwargs)
  File "/lustre/home/cceahke/lammps/venv/lib/python3.9/site-packages/yaff-1.6.0-py3.9-linux-x86_64.egg/yaff/external/liblammps.py", line 193, in __init__
    self.lammps.command("pair_coeff %d %d table %s %s" % (i+1,j+1,fn_table,name))
  File "/home/cceahke/lammps/lammps-29Sep2021_INTEL_install/lib/python3.9/site-packages/lammps/core.py", line 578, in command
    self.lib.lammps_command(self.lmp,cmd)
  File "/home/cceahke/lammps/lammps-29Sep2021_INTEL_install/lib/python3.9/site-packages/lammps/core.py", line 49, in __exit__
    raise self.lmp._lammps_exception
lammps.core.MPIAbortException: 'ERROR on proc 0: Did not find keyword in table file (src/pair_table.cpp:368)\nLast command: pair_coe'

Docs say:

Did not find keyword in table file Keyword used in pair_coeff command was not found in table file.

heatherkellyucl commented 2 years ago

Some discussion of Intel and the tests failing at the bottom of https://github.com/lammps/lammps/issues/3132

Outside of the INTEL package code, there is little in LAMMPS that benefits from using Intel compilers over GCC or Clang, so it may be best to recommend not using such binaries except when using the INTEL package functionality and then making validation tests. It used to be different when there was more Fortran code in LAMMPS, but that is a thing of the past.

heatherkellyucl commented 2 years ago

Going ahead with the Intel + GPU build, so we at least have something we can compare with GNU + GPU for any benchmarking later.

heatherkellyucl commented 2 years ago

Intel GPU build completed on Myriad L node, running tests.

heatherkellyucl commented 2 years ago

Tests:

89% tests passed, 53 tests failed out of 482

Total Test time (real) = 946.89 sec

The following tests FAILED:
         10 - AtomStyles (Failed)
         95 - MolPairStyle:coul_shield (Failed)
        118 - MolPairStyle:lj_charmm_coul_long_soft (Failed)
        119 - MolPairStyle:lj_charmm_coul_msm (Failed)
        142 - MolPairStyle:lj_cut_coul_long_soft (Failed)
        156 - MolPairStyle:lj_expand_coul_long (Failed)
        169 - MolPairStyle:lj_sdk_coul_long (Failed)
        172 - MolPairStyle:lj_sdk_coul_table (Failed)
        176 - MolPairStyle:lj_switch3_coulgauss_long (Failed)
        199 - MolPairStyle:tip4p_long_soft (Failed)
        210 - AtomicPairStyle:buck_coul_cut_qeq_point (Failed)
        211 - AtomicPairStyle:buck_coul_cut_qeq_shielded (Failed)
        228 - AtomicPairStyle:edip (Failed)
        236 - AtomicPairStyle:meam_spline (Failed)
        237 - AtomicPairStyle:meam_sw_spline (Failed)
        240 - AtomicPairStyle:reaxff (Failed)
        241 - AtomicPairStyle:reaxff_lgvdw (Failed)
        242 - AtomicPairStyle:reaxff_noqeq (Failed)
        243 - AtomicPairStyle:reaxff_tabulate (Failed)
        254 - ManybodyPairStyle:bop_save (Failed)
        257 - ManybodyPairStyle:drip (Failed)
        258 - ManybodyPairStyle:drip_real (Failed)
        259 - ManybodyPairStyle:edip_multi (Failed)
        263 - ManybodyPairStyle:ilp-graphene-hbn (Failed)
        264 - ManybodyPairStyle:ilp-graphene-hbn_notaper (Failed)
        265 - ManybodyPairStyle:kolmogorov_crespi_full (Failed)
        268 - ManybodyPairStyle:lcbop (Failed)
        269 - ManybodyPairStyle:lebedeva_z (Failed)
        270 - ManybodyPairStyle:meam (Failed)
        275 - ManybodyPairStyle:mliap_so3 (Failed)
        276 - ManybodyPairStyle:nb3b_harmonic (Failed)
        279 - ManybodyPairStyle:polymorphic_sw (Failed)
        280 - ManybodyPairStyle:polymorphic_tersoff (Failed)
        294 - ManybodyPairStyle:tersoff_table (Failed)
        302 - BondStyle:gaussian (Failed)
        379 - FixTimestep:addtorque_const (Failed)
        382 - FixTimestep:aveforce_variable (Failed)
        391 - FixTimestep:momentum (Failed)
        393 - FixTimestep:nph (Failed)
        394 - FixTimestep:nph_sphere (Failed)
        396 - FixTimestep:npt_iso (Failed)
        397 - FixTimestep:npt_sphere_aniso (Failed)
        398 - FixTimestep:npt_sphere_iso (Failed)
        399 - FixTimestep:npt_sphere_tri (Failed)
        422 - FixTimestep:rigid_npt_small (Failed)
        427 - FixTimestep:rigid_nvt (Failed)
        434 - FixTimestep:shake_angle (Failed)
        444 - FixTimestep:temp_csld (Failed)
        465 - DihedralStyle:table_cut_linear (Failed)
        467 - DihedralStyle:table_linear (Failed)
        468 - DihedralStyle:table_spline (Failed)
        476 - ImproperStyle:harmonic (Failed)
        478 - ImproperStyle:inversion_harmonic (Failed)
Errors while running CTest
Output from these tests are in: /home/ccspapp/Scratch/lammps/29Sep2021_update2/gpumixed/tmp.9vcXDsdFUM/lammps-stable_29Sep2021_update2/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Same as the basic install. (Not sure if any of these test the gpu part?)

heatherkellyucl commented 2 years ago

Versions all available.

Need the extra AVX512 node builds on Michael in /shared/ucl/apps-overrides/avx512/lammps and then uncomment that in the modulefiles so they can be found.

heatherkellyucl commented 2 years ago

Jobs submitted for the Michael AVX512 builds, from ccspapp. See scripts in /home/ccspapp/Scratch/lammps-avx512.

heatherkellyucl commented 2 years ago

The Michael installs and modulefile updates are complete.

balston commented 2 years ago

Running the GPU build on Young.

balston commented 2 years ago

The build has finished - no errors as far as I can tell.

Trying a test job next.

balston commented 2 years ago

I've just realised that the build didn't complete as the install didn't happen! Going back to find out what happened.

balston commented 2 years ago

Found the error (should have logged at the error log first time):

No space left on device
FATAL ERROR: fwrite on file failed
compilation aborted for /home/ccspapp/Scratch/lammps/29Sep2021_update2/gpumixed/tmp.91MNsDoebk/lammps-stable_29Sep2021_update2/lib/colvars/colvar.cpp (code 1)
make[2]: *** [CMakeFiles/colvars.dir/home/ccspapp/Scratch/lammps/29Sep2021_update2/gpumixed/tmp.91MNsDoebk/lammps-stable_29Sep2021_update2/lib/colvars/colvar.cpp.o] Error 1
make[1]: *** [CMakeFiles/colvars.dir/all] Error 2
balston commented 2 years ago

This time the build finished after adding:

mkdir -p ~/Scratch/LAMMPS_build/tmp
export TMPDIR=~/Scratch/LAMMPS_build/tmp

to the job script.

balston commented 2 years ago

Example job submitted.

lawrence910426 commented 2 years ago

Also faced the same problem. I could build lammps with gnu g++ with GPU on Taiwania (https://man.twcc.ai/@TWCC-III-manual/H1bEXeGcu) but I couldn't build lammps with intel icc with or without GPU.

This is what I have encountered

MathFunctions.h(327): error: no instance of overloaded function "sqrt" matches the argument list
            argument types are: (const Eigen::internal::Packet8d)
      return sqrt(x);
             ^