E4S-Project / testsuite

E4S test suite with validation tests
MIT License
19 stars 31 forks source link

Veloc test fails on perlmutter #56

Open wspear opened 1 year ago

wspear commented 1 year ago

@gonsie @vsoch

The veloc standalone test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/veloc fails when run on the veloc installed as part of the e4s 22.11 deployment on perlmutter using these variants:

-- linux-sles15-zen3 / gcc@11.2.0 -------------------------------
44htwoe veloc@1.5~ipo build_system=cmake build_type=RelWithDebInfo

With this console output:

REDSET 0.1.0 ABORT: rank 0 on nid001901: XOR requires at least 2 ranks per set, but found 1 rank(s) in set @ /tmp/lpeyrala/spack-stage/spack-stage-redset-0.2.0-4rss7cokqukwuvzvzlymxazgem3a6gim/spack-src/src/redset_xor.c:157
MPICH ERROR [Rank 0] [job id 3921177.7] [Tue Dec  6 14:52:36 2022] [nid001901] - Abort(-1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
srun: error: nid001901: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=3921177.7
slurmstepd: error: *** STEP 3921177.7 ON nid001901 CANCELLED AT 2022-12-06T22:52:36 ***
srun: error: nid001901: task 1: Terminated
srun: Force Terminated StepId=3921177.7

Updating to the latest heatdis_mem.c included with Veloc 1.5 resulted in this runtime error output:

[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
vsoch commented 1 year ago

Sorry I don't work on this, not sure how you want my help? If you have a specific question I can help with let me know. Maybe ping someone that develops veloc or maintains this repo? I'm actually not sure what it is.

bnicolae commented 1 year ago

Please update to the latest VELOC release, which is 1.6. If you still see any issues, try the master branch.

wspear commented 1 year ago

@bnicolae

Veloc 1.6 fails to be built by the spack package out of the box (1.5 builds fine in the same environment). The build error looks like:

==> Installing veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36
==> No binary for veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36 found: installing from source
==> Fetching https://github.com/ECP-VeloC/VELOC/archive/1.6.tar.gz
==> No patches needed for veloc
==> veloc: Executing phase: 'cmake'
==> veloc: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j16'

5 errors found in build log:
     81     [ 51%] Linking C executable heatdis_original
     82     cd /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen/test && /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/cmake-3.24.
            2-sgm62x4xhlgn7rl3h6rve2yez2bsra6s/bin/cmake -E cmake_link_script CMakeFiles/heatdis_original.dir/link.txt --verbose=1
     83     /home/wspear/bin/SPACK/spack/lib/spack/env/gcc/gcc -O2 -g -DNDEBUG CMakeFiles/heatdis_original.dir/heatdis_original.c.o -o heatdis_original  -Wl,-rpath,/home/wspear/bin/SPACK/spack/opt/spack/linux-u
            buntu22.04-westmere/gcc-11.2.0/intel-oneapi-mpi-2021.7.0-tib45i3vuhw4krn7oiihc4hlndmpbtce/mpi/2021.7.0/lib/release -lm /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/in
            tel-oneapi-mpi-2021.7.0-tib45i3vuhw4krn7oiihc4hlndmpbtce/mpi/2021.7.0/lib/release/libmpi.so /usr/lib/x86_64-linux-gnu/librt.a /usr/lib/x86_64-linux-gnu/libpthread.a /usr/lib/x86_64-linux-gnu/libdl.a
     84     make[2]: Leaving directory '/tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen'
     85     [ 51%] Built target heatdis_original
     86     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp: In constructor 'axl_module_t::axl_module_t(const string&, const string&, const st
            ring&)':
  >> 87     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:22:23: error: too few arguments to function 'int AXL_Init(const char*)'
     88        22 |     int ret = AXL_Init();
     89           |               ~~~~~~~~^~
     90     In file included from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.hpp:5,
     91                      from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:1:
     92     /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/axl-0.3.0-jalfn42f5zij76i2hr5xr6f24mfji53r/include/axl.h:35:5: note: declared here
     93        35 | int AXL_Init (const char* state_file);
     94           |     ^~~~~~~~
     95     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp: In member function 'bool axl_module_t::axl_transfer_file(const string&, const str
            ing&)':
  >> 96     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:28:24: error: too many arguments to function 'int AXL_Create(axl_xfer_t, const cha
            r*)'
     97        28 |     int id = AXL_Create(axl_type, source.c_str(), NULL), result = id;
     98           |              ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     99     In file included from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.hpp:5,
     100                     from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:1:
     101    /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/axl-0.3.0-jalfn42f5zij76i2hr5xr6f24mfji53r/include/axl.h:45:5: note: declared here
     102       45 | int AXL_Create (axl_xfer_t type, const char* name);
     103          |     ^~~~~~~~~~
  >> 104    make[2]: *** [src/modules/CMakeFiles/veloc-modules.dir/build.make:247: src/modules/CMakeFiles/veloc-modules.dir/__/storage/axl_module.cpp.o] Error 1
     105    make[2]: *** Waiting for unfinished jobs....
     106    make[2]: Leaving directory '/tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen'
  >> 107    make[1]: *** [CMakeFiles/Makefile2:209: src/modules/CMakeFiles/veloc-modules.dir/all] Error 2
     108    make[1]: Leaving directory '/tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen'
  >> 109    make: *** [Makefile:149: all] Error 2

See build log for details:
  /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-out.txt
wspear commented 1 year ago

Also: Main shows the same error. It looks like the master branch was changed to main so spack install veloc@master won't work.

wspear commented 1 year ago

We see the same error on Crusher.

bnicolae commented 1 year ago

@wspear: Most likely you are using old versions of the dependencies in the Spack recipe. Here are the versions of the dependencies you should use (you can find them in auto-install.py, our default non-Spack installation script):

install_dep('https://github.com/ECP-VeloC/KVTree.git', 'v1.2.0')
install_dep('https://github.com/ECP-VeloC/AXL.git', 'v0.5.0')
install_dep('https://github.com/ECP-VeloC/rankstr.git', 'v0.1.0')
install_dep('https://github.com/ECP-VeloC/shuffile.git', 'v0.1.0')
install_dep('https://github.com/ECP-VeloC/redset.git', 'v0.1.0')
install_dep('https://github.com/ECP-VeloC/er.git', 'v0.1.0')
wspear commented 1 year ago

@bnicolae I was able to build veloc@1.6 with the changes in this PR: https://github.com/spack/spack/pull/34706 but I'm still investigating the hang on Crusher/Perlmutter. Could you take a look at the PR and confirm it looks sane? I added the dependencies you listed but I wasn't clear if they needed configuration options added. (I'm guessing not since it builds without any)