Closed TApplencourt closed 4 hours ago
Fix @abagusetty bug, where the new mpich was hanging iprof.
It was due to yakza checking for the soname of our tracing library and expecting it to match ze_loader.
yakza
soname
ze_loader
Working now
applenco@x4516c1s7b0n0:~/mpi_hang> mpirun -n 1 -- ~/THAPI/build/ici/bin/iprof -- ./a.out THAPI_SYNC_DAEMON_MPI Warning: Did not get MPI_THREAD_SINGLE, got MPI_THREAD_MULTIPLE Hello world from processor x4516c1s7b0n0, rank 0 out of 1 processors THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-10-11T16:43:44+00:00 BACKEND_MPI | 1 Hostnames | 1 Processes | 1 Threads | Name | Time | Time(%) | Calls | Average | Min | Max | MPI_Init | 505.26ms | 98.31% | 1 | 505.26ms | 505.26ms | 505.26ms | MPI_Finalize | 8.66ms | 1.68% | 1 | 8.66ms | 8.66ms | 8.66ms | MPI_Get_processor_name | 4.10us | 0.00% | 1 | 4.10us | 4.10us | 4.10us | MPI_Comm_size | 3.62us | 0.00% | 1 | 3.62us | 3.62us | 3.62us | MPI_Comm_rank | 645ns | 0.00% | 1 | 645.00ns | 645ns | 645ns | Total | 513.93ms | 100.00% | 5 | BACKEND_ZE | 1 Hostnames | 1 Processes | 1 Threads | Name | Time | Time(%) | Calls | Average | Min | Max | zeModuleCreate | 22.75ms | 67.75% | 60 | 379.18us | 91.28us | 1.06ms | zeEventCreate | 2.33ms | 6.93% | 4096 | 567.78ns | 245ns | 17.64us | zeModuleDestroy | 2.10ms | 6.27% | 60 | 35.08us | 2.71us | 383.52us | zeDeviceGet | 1.88ms | 5.58% | 6 | 312.52us | 848ns | 1.87ms | zeDeviceCanAccessPeer | 1.58ms | 4.72% | 66 | 24.01us | 150ns | 61.77us | zeKernelCreate | 1.30ms | 3.87% | 864 | 1.51us | 646ns | 341.80us | zeEventDestroy | 682.80us | 2.03% | 4096 | 166.70ns | 137ns | 3.58us | zeKernelDestroy | 338.01us | 1.01% | 864 | 391.21ns | 193ns | 2.69us | zeEventPoolCreate | 234.87us | 0.70% | 7 | 33.55us | 10.20us | 136.24us | zeDriverGetExtensionFunctionAddress | 224.28us | 0.67% | 7 | 32.04us | 571ns | 215.35us | zeEventPoolDestroy | 118.06us | 0.35% | 7 | 16.87us | 7.25us | 59.40us | zeContextCreate | 14.91us | 0.04% | 3 | 4.97us | 4.72us | 5.40us | zeDeviceGetSubDevices | 12.53us | 0.04% | 24 | 522.17ns | 116ns | 2.76us | zeInit | 4.13us | 0.01% | 3 | 1.38us | 912ns | 1.77us | zeDriverGet | 3.93us | 0.01% | 5 | 785.60ns | 174ns | 1.85us | zeContextDestroy | 3.71us | 0.01% | 1 | 3.71us | 3.71us | 3.71us | Total | 33.58ms | 100.00% | 10169 | applenco@x4516c1s7b0n0:~/mpi_hang> module list Currently Loaded Modules: 1) gcc-runtime/12.2.0-267awrk 16) elfutils/0.186-yuor73r 31) ruby-ffi/1.15.4-5mo5s2q 2) gmp/6.2.1-yctcuid 17) pcre2/10.43-vzzidje 32) ruby-babeltrace2/0.1.4-3k74k53 3) mpfr/4.2.1-fhgnwe7 18) berkeley-db/18.1.40-2frw2z6 33) ruby-narray-old/0.6.1.2-iriybfo 4) mpc/1.3.1-ygprpb4 19) gdbm/1.23 34) ruby-narray-ffi/1.4.4-x4lt3r2 5) gcc/12.2.0 20) perl/5.38.0 35) ruby-opencl/1.3.12-pbmvgrc 6) intel_compute_runtime/release/950.13 21) libmd/1.0.4-nvn3prd 36) thapi/git.ceaabfc-serial 7) oneapi/eng-compiler/2024.07.30.002 22) libbsd/0.12.1-dsshygz 37) ruby-cast/0.3.1-3kwxnzj 8) libfabric/1.20.1 23) expat/2.6.2-s3fkrly 38) ruby-cast-to-yaml/0.1.1-5dhftgq 9) cray-pals/1.4.0 24) python/3.10.13 39) ruby-mini-portile2/2.6.1-zbqteay 10) cray-libpals/1.4.0 25) glib/2.78.3-lpcguoz 40) ruby-nokogiri/1.12.5-3x7wfrs 11) lz4/1.9.4 26) babeltrace2/2.0.6-w37vov2 41) ruby-metababel/1.1.2-6o367to 12) libarchive/3.7.1-fvef5p2 27) lttng-tools/2.12.11 42) gmake/4.4.1 13) libiconv/1.17-kg7cda7 28) abseil-cpp/20240116.2-cihlltz 43) hwloc/2.9.2-level-zero 14) libmicrohttpd/0.9.50-jjjslhm 29) protobuf/3.27.1 44) yaksa/0.3-fxpciid 15) sqlite/3.43.2-2onu5lp 30) ruby/2.7.2-w7it2ky 45) mpich/opt/git.063ef64
Fix @abagusetty bug, where the new mpich was hanging iprof.
It was due to
yakza
checking for thesoname
of our tracing library and expecting it to matchze_loader
.Working now