E4S-Project / testsuite

E4S test suite with validation tests
MIT License
19 stars 31 forks source link

trilinos test failed #39

Open shahzebsiddiqui opened 1 year ago

shahzebsiddiqui commented 1 year ago

CDASH: https://my.cdash.org/test/63278708

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/trilinos.yml

I suspect this test is failing because we have this set in our startup modulefile gpu which is loaded by default

e4s:login34> ml show gpu
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /global/common/software/nersc/pm-2022.08.4/extra_modulefiles/gpu/1.0.lua:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
family("hardware")
load("cudatoolkit")
load("craype-accel-nvidia80")
setenv("MPICH_GPU_SUPPORT_ENABLED","1")

We can unload this module by just loading cpu module. Anyhow i wanted to bring this up.

Error:

+ cd -
/global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Skipping load: Environment already setup
+ cd ./build
+ export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ export OMP_NUM_THREADS=4
+ OMP_NUM_THREADS=4
+ srun -n 8 ./Zoltan
MPICH ERROR [Rank 0] [job id 3289011.0] [Wed Sep 28 19:56:45 2022] [nid003233] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
 (Other MPI error)

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

srun: error: nid003233: tasks 0-7: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=3289011.0
Run failed