Open shahzebsiddiqui opened 1 year ago
CDASH: https://my.cdash.org/test/63278708
buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/trilinos.yml
I suspect this test is failing because we have this set in our startup modulefile gpu which is loaded by default
gpu
e4s:login34> ml show gpu -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- /global/common/software/nersc/pm-2022.08.4/extra_modulefiles/gpu/1.0.lua: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- family("hardware") load("cudatoolkit") load("craype-accel-nvidia80") setenv("MPICH_GPU_SUPPORT_ENABLED","1")
We can unload this module by just loading cpu module. Anyhow i wanted to bring this up.
cpu
Error:
+ cd - /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos Skipping load: Environment already setup + cd ./build + export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 + CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 + export OMP_NUM_THREADS=4 + OMP_NUM_THREADS=4 + srun -n 8 ./Zoltan MPICH ERROR [Rank 0] [job id 3289011.0] [Wed Sep 28 19:56:45 2022] [nid003233] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked (Other MPI error) aborting job: MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked srun: error: nid003233: tasks 0-7: Segmentation fault srun: launch/slurm: _step_signal: Terminating StepId=3289011.0 Run failed
CDASH: https://my.cdash.org/test/63278708
buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/trilinos.yml
I suspect this test is failing because we have this set in our startup modulefile
gpu
which is loaded by defaultWe can unload this module by just loading
cpu
module. Anyhow i wanted to bring this up.Error: