UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

Install Request: LAMMPS 15th June 2023 release but doing 2nd August 2023 [IN06098486] #545

Open balston opened 1 year ago

balston commented 1 year ago

The 15th June 2023 release includes for the first time support to output vector style variables during a simulation run which this research group needs.

It looks like the latest version in Spack is 8 Feb 2023.

https://www.lammps.org/download.html

heatherkellyucl commented 1 year ago

Ticket now IN:06149543.

balston commented 1 year ago

LAMMPS 2nd August 2023 is now the latest release so install this one.

balston commented 1 year ago

A build of LAMMPS 2nd August 2023 using GNU compilers and FFTW is now on Myriad and Young using our build scripts method:

module -f unload compilers mpi gcc-libs
module load beta-modules
./lammps-2Aug2023-basic-fftw-gnu_install 2>&1 | tee ~/Software/LAMMPS/lammps-2Aug2023-basic-fftw-gnu_install.log-1

Need to produce a module file and run some tests next.

balston commented 1 year ago

I now have the module file on Myriad and Young and have submitted test jobs on both clusters.

balston commented 1 year ago

Both the test jobs on Myriad and Young worked. Modules needed for basic GNU FFTW version are:

Myriad

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
module load numactl/2.0.12
module load binutils/2.36.1/gnu-10.2.0
module load ucx/1.9.0/gnu-10.2.0
module load mpi/openmpi/4.0.5/gnu-10.2.0
module load python3/3.9-gnu-10.2.0
module load fftw/3.3.9/gnu-10.2.0
module load lammps/2aug23/basic-fftw/gnu-10.2.0 

Young

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
module load mpi/openmpi/4.0.5/gnu-10.2.0
module load python3/3.9-gnu-10.2.0
module load fftw/3.3.9/gnu-10.2.0
module load lammps/2aug23/basic-fftw/gnu-10.2.0
balston commented 1 year ago

Now working on the GNU + GPU build.

Build script updated and pulled to Young. Needs to be built on a GPU node so job submitted to build LAMMPS 2nd August 2023 GNU+GPU on Young. Build script:

lammps-2Aug2023-gpu-gnu_install

balston commented 1 year ago

Build job for LAMMPS 2nd August 2023 GNU+GPU submitted on Myriad as well.

balston commented 1 year ago

Both jobs are running.

balston commented 1 year ago

CPU build done on Kathleen and test job submitted.

balston commented 1 year ago

I've only had time today to check the output from the test job on Kathleen. It looks like it has worked ok.

balston commented 1 year ago

Look at:

/home/ccspapp/Software/LAMMPS/tmp.2P0AWpwdjR/lammps-2Aug2023/cmake/presets/most.cmake

for list of LAMMPS packages in our default CPU builds.

balston commented 1 year ago

I had to redo the GPU builds on Myriad and Young as I had missed out the FFTW module.

The Myriad build has completed and a job running the GPU unit tests has been submitted.

Young build job is still waiting.

balston commented 1 year ago

Test jobs for the GPU build have been submitted on Myriad and Young.

balston commented 1 year ago

I've also been trying a build of the basic Intel version but this is failing during compilation:

/dev/shm/ccspapp/lammps/tmp.v9Hhxi8WAq/lammps-stable_2Aug2023/build/_deps/googletest-src/googletest/include/gtest/gtest-matchers.h(434): error: namespace "std" has no member "is_trivially_copy_constructible"
             std::is_trivially_copy_constructible<M>::value &&
                  ^
          detected during:
            processing of template argument list for "testing::internal::MatcherBase<T>::ValuePolicy [with T=const std::string &]" based on template argument <MM> at line 483
            instantiation of "void testing::internal::MatcherBase<T>::Init(M &&) [with T=const std::string &, M=const testing::MatcherInterface<const std::string &> *&]" at line 312
            instantiation of "testing::internal::MatcherBase<T>::MatcherBase(const testing::MatcherInterface<U> *) [with T=const std::string &, U=const std::string &]" at line 536

/dev/shm/ccspapp/lammps/tmp.v9Hhxi8WAq/lammps-stable_2Aug2023/build/_deps/googletest-src/googletest/include/gtest/gtest-matchers.h(434): error: type name is not allowed
             std::is_trivially_copy_constructible<M>::value &&
                                                  ^
          detected during:
            processing of template argument list for "testing::internal::MatcherBase<T>::ValuePolicy [with T=const std::string &]" based on template argument <MM> at line 483
            instantiation of "void testing::internal::MatcherBase<T>::Init(M &&) [with T=const std::string &, M=const testing::MatcherInterface<const std::string &> *&]" at line 312
            instantiation of "testing::internal::MatcherBase<T>::MatcherBase(const testing::MatcherInterface<U> *) [with T=const std::string &, U=const std::string &]" at line 536

/dev/shm/ccspapp/lammps/tmp.v9Hhxi8WAq/lammps-stable_2Aug2023/build/_deps/googletest-src/googletest/include/gtest/gtest-matchers.h(434): error: the global scope has no "value"
             std::is_trivially_copy_constructible<M>::value &&
                                                      ^
          detected during:
            processing of template argument list for "testing::internal::MatcherBase<T>::ValuePolicy [with T=const std::string &]" based on template argument <MM> at line 483
            instantiation of "void testing::internal::MatcherBase<T>::Init(M &&) [with T=const std::string &, M=const testing::MatcherInterface<const std::string &> *&]" at line 312
            instantiation of "testing::internal::MatcherBase<T>::MatcherBase(const testing::MatcherInterface<U> *) [with T=const std::string &, U=const std::string &]" at line 536

compilation aborted for /dev/shm/ccspapp/lammps/tmp.v9Hhxi8WAq/lammps-stable_2Aug2023/build/_deps/googletest-src/googletest/src/gtest-all.cc (code 2)
make[2]: *** [_deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o] Error 2
make[2]: Leaving directory `/dev/shm/ccspapp/lammps/tmp.v9Hhxi8WAq/lammps-stable_2Aug2023/build'
make[1]: *** [_deps/googletest-build/googletest/CMakeFiles/gtest.dir/all] Error 2
make[1]: Leaving directory `/dev/shm/ccspapp/lammps/tmp.v9Hhxi8WAq/lammps-stable_2Aug2023/build'
make: *** [all] Error 2

Using Intel 2020 compilers.

heatherkellyucl commented 1 year ago

I would use compilers/intel/2022.2 and not 2020 for anything (because of newer gcc underneath).

balston commented 1 year ago

Updated Intel build to use gcc-libs/10.2.0 and Intel 2022.2:

module -f unload compilers mpi gcc-libs
module load beta-modules
BUILD_UNIT_TESTS=yes ./lammps-2Aug2023-basic_install 2>&1 | tee ~/Software/LAMMPS/lammps-2Aug2023-basic_install.log-2
balston commented 1 year ago

The test jobs for the GNU + GPU version have run successfully on Myriad and Young.

balston commented 1 year ago

The basic Intel build on Myriad completed without errors using Intel 2022.2 compilers. It will need testing now.

balston commented 1 year ago

I've submitted a test job for the basic Intel version on Myriad.

balston commented 1 year ago

The LAMMPS 2nd August 2023 basic Intel version test job runs on Myriad. I'm now going to build this version on Kathleen and Young.

balston commented 1 year ago

The builds on Kathleen and Young have finished. Will now need to check for errors and run a 2 node or bigger test job.

balston commented 1 year ago

Two node test job for the basic Intel version submitted on Kathleen.

balston commented 1 year ago

Two node test job for the basic Intel version submitted on Young.

balston commented 1 year ago

The Kathleen job has been running for 6 hours (set for about 12). The Young one is still queueing.

balston commented 1 year ago

Both jobs finished overnight and look ok. The Kathleen one was a bigger job and did 20,000 in about 8 hours and the smaller Young one 2000 steps in 48 minutes. I'll upload a module file for the basic Intel version.

balston commented 1 year ago

module file updated and loaded onto Kathleen, Myriad and Young.

balston commented 1 year ago

To use LAMMPS 2nd August 2023 version basic Intel build you need the following modules:

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/intel/2022.2
module load mpi/intel/2019/update6/intel
module load python/3.9.10
module load lammps/2aug23/basic/intel-2022.2
balston commented 12 months ago

Doing the build with the INTEL package next. On Kathleen first:

module -f unload compilers mpi gcc-libs
module load beta-modules
./lammps-2Aug2023-INTEL_install 2>&1 | tee ~/Software/LAMMPS/lammps-2Aug2023-INTEL_instal.log
balston commented 12 months ago

The INTEL build on Kathleen has completed without errors.

balston commented 12 months ago

I have a test job submitted for the INTEL build on Kathleen.

balston commented 12 months ago

It has started to run:

----------------------------------------------------------
Using INTEL Package without Coprocessor.
Compiler: Intel Classic C++ 20.21.6 / Intel(R) C++ g++ 10.2 mode
SIMD compiler directives: Enabled
Precision: mixed

waiting to see how it runs overnight - long test run with 20,000 steps.

balston commented 12 months ago

Job ran to completion and the speed up is quite good. 3 hours 15 minutes for the INTEL package version with about 8 hours for the basic Intel build.

balston commented 12 months ago

now to build the Intel variant on Young.

balston commented 12 months ago

build on Young finished with out errors. Test job submitted.

balston commented 12 months ago

Test job is still queuing so I will check results tomorrow.

balston commented 11 months ago

The job failed because I made a mistake in my job script. I've corrected it and re-submitted the job.

balston commented 11 months ago

I'm getting the build script for the Intel GPU variant ready to submit as a job on Young from ccspapp.

balston commented 11 months ago

Build job for the Intel GPU variant submitted. Job script is:

/home/ccspapp/Software/LAMMPS/build-intel-gpu-2Aug2023.sh
balston commented 11 months ago

Test job of the INTEL package variant worked this time. Took about 20 minutes to run as opposed to 48 minutes for the basic Intel variant.

balston commented 11 months ago

The module file for the 2nd August 2023 version INTEL package variant has been uploaded to Kathleen and Young. To use the INTEL package variant the following module commands are needed:

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/intel/2022.2
module load mpi/intel/2019/update6/intel
module load python/3.9.10
module load lammps/2aug23/userintel/intel-2022.2
balston commented 11 months ago

The Intel GPU build job ran overnight but failed with:

      Options:       -xHost;-fp-model;fast=2;-no-prec-div;-qoverride-limits;-diag-disable=10441;-diag-disable=2196
In file included from /shared/ucl/apps/cuda/11.3.1/gnu-10.2.0/include/cuda_runtime.h(83),
                 from /home/ccspapp/Scratch/lammps/2Aug2023/gpumixed/tmp.ZQuVwdmCED/lammps-stable_2Aug2023/lib/gpu/lal_zbl.cu(0):
/shared/ucl/apps/cuda/11.3.1/gnu-10.2.0/include/crt/host_config.h(110): error: #error directive: -- unsupported ICC configuration! Only ICC 15.0, ICC 16.0, ICC 17.0, ICC 18.0 and ICC 19.x on Linux x86_64 are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  #error -- unsupported ICC configuration! Only ICC 15.0, ICC 16.0, ICC 17.0, ICC 18.0 and ICC 19.x on Linux x86_64 are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
   ^

CMake Error at cuda_compile_fatbin_1_generated_lal_zbl.cu.fatbin.RelWithDebInfo.cmake:212 (message):
  Error generating
  /home/ccspapp/Scratch/lammps/2Aug2023/gpumixed/tmp.ZQuVwdmCED/lammps-stable_2Aug2023/build/cuda_compile_fatbin_1_generated_lal_zbl.cu.fatbin

make[2]: *** [cuda_compile_fatbin_1_generated_lal_zbl.cu.fatbin] Error 1
make[1]: *** [CMakeFiles/gpu.dir/all] Error 2
make: *** [all] Error 2

will need to investigate tomorrow now.

balston commented 11 months ago

Switched to using CUDA 11.8.0 instead of 11.3.1. I had to install this version first as it wasn't on Young. The build has finished with out errors so I'm running a test job next.

balston commented 11 months ago

Intel GPU variant test job submitted.

balston commented 11 months ago

My test job failed because I hadn't got the module loads correct. I've now re-submitted it.

balston commented 11 months ago

The Intel GPU variant test job has failed with MPI errors:

GERun: GErun command being run:
GERun:  mpirun --rsh=ssh -machinefile /tmpdir/job/1211349.undefined/machines.unique -np 16 -rr lmp_gpu -sf gpu -pk gpu 1 -in in.lj
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 917: group_id < group_num
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x2b286b6e31d4]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x2b286ae6b031]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(+0x44c505) [0x2b286b1ac505]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(+0x7e9b0c) [0x2b286b549b0c]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(+0x64cd70) [0x2b286b3acd70]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(+0x1fe5fa) [0x2b286af5e5fa]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(+0x4664b4) [0x2b286b1c64b4]
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/lib/release/libmpi.so.12(MPI_Init+0x11b) [0x2b286b1c1c7b]
lmp_gpu() [0x402622]
balston commented 11 months ago

I'me beginning to build the non-GPU variants on Michael now:

balston commented 11 months ago

All thats left to do now is add the missing variants from Myriad when the cluster is restored to service.