argonne-lcf / THAPI

A tracing infrastructure for heterogeneous computing applications.
Other
22 stars 9 forks source link

fix_ze_soname #296

Closed TApplencourt closed 4 hours ago

TApplencourt commented 5 hours ago

Fix @abagusetty bug, where the new mpich was hanging iprof.

It was due to yakza checking for the soname of our tracing library and expecting it to match ze_loader.

Working now

applenco@x4516c1s7b0n0:~/mpi_hang> mpirun -n 1 -- ~/THAPI/build/ici/bin/iprof -- ./a.out
THAPI_SYNC_DAEMON_MPI Warning: Did not get MPI_THREAD_SINGLE, got MPI_THREAD_MULTIPLE
Hello world from processor x4516c1s7b0n0, rank 0 out of 1 processors
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-10-11T16:43:44+00:00
BACKEND_MPI | 1 Hostnames | 1 Processes | 1 Threads |

                  Name |     Time | Time(%) | Calls |  Average |      Min |      Max |
              MPI_Init | 505.26ms |  98.31% |     1 | 505.26ms | 505.26ms | 505.26ms |
          MPI_Finalize |   8.66ms |   1.68% |     1 |   8.66ms |   8.66ms |   8.66ms |
MPI_Get_processor_name |   4.10us |   0.00% |     1 |   4.10us |   4.10us |   4.10us |
         MPI_Comm_size |   3.62us |   0.00% |     1 |   3.62us |   3.62us |   3.62us |
         MPI_Comm_rank |    645ns |   0.00% |     1 | 645.00ns |    645ns |    645ns |
                 Total | 513.93ms | 100.00% |     5 |

BACKEND_ZE | 1 Hostnames | 1 Processes | 1 Threads |

                               Name |     Time | Time(%) | Calls |  Average |     Min |      Max |
                     zeModuleCreate |  22.75ms |  67.75% |    60 | 379.18us | 91.28us |   1.06ms |
                      zeEventCreate |   2.33ms |   6.93% |  4096 | 567.78ns |   245ns |  17.64us |
                    zeModuleDestroy |   2.10ms |   6.27% |    60 |  35.08us |  2.71us | 383.52us |
                        zeDeviceGet |   1.88ms |   5.58% |     6 | 312.52us |   848ns |   1.87ms |
              zeDeviceCanAccessPeer |   1.58ms |   4.72% |    66 |  24.01us |   150ns |  61.77us |
                     zeKernelCreate |   1.30ms |   3.87% |   864 |   1.51us |   646ns | 341.80us |
                     zeEventDestroy | 682.80us |   2.03% |  4096 | 166.70ns |   137ns |   3.58us |
                    zeKernelDestroy | 338.01us |   1.01% |   864 | 391.21ns |   193ns |   2.69us |
                  zeEventPoolCreate | 234.87us |   0.70% |     7 |  33.55us | 10.20us | 136.24us |
zeDriverGetExtensionFunctionAddress | 224.28us |   0.67% |     7 |  32.04us |   571ns | 215.35us |
                 zeEventPoolDestroy | 118.06us |   0.35% |     7 |  16.87us |  7.25us |  59.40us |
                    zeContextCreate |  14.91us |   0.04% |     3 |   4.97us |  4.72us |   5.40us |
              zeDeviceGetSubDevices |  12.53us |   0.04% |    24 | 522.17ns |   116ns |   2.76us |
                             zeInit |   4.13us |   0.01% |     3 |   1.38us |   912ns |   1.77us |
                        zeDriverGet |   3.93us |   0.01% |     5 | 785.60ns |   174ns |   1.85us |
                   zeContextDestroy |   3.71us |   0.01% |     1 |   3.71us |  3.71us |   3.71us |
                              Total |  33.58ms | 100.00% | 10169 |

applenco@x4516c1s7b0n0:~/mpi_hang> module list

Currently Loaded Modules:
  1) gcc-runtime/12.2.0-267awrk            16) elfutils/0.186-yuor73r         31) ruby-ffi/1.15.4-5mo5s2q
  2) gmp/6.2.1-yctcuid                     17) pcre2/10.43-vzzidje            32) ruby-babeltrace2/0.1.4-3k74k53
  3) mpfr/4.2.1-fhgnwe7                    18) berkeley-db/18.1.40-2frw2z6    33) ruby-narray-old/0.6.1.2-iriybfo
  4) mpc/1.3.1-ygprpb4                     19) gdbm/1.23                      34) ruby-narray-ffi/1.4.4-x4lt3r2
  5) gcc/12.2.0                            20) perl/5.38.0                    35) ruby-opencl/1.3.12-pbmvgrc
  6) intel_compute_runtime/release/950.13  21) libmd/1.0.4-nvn3prd            36) thapi/git.ceaabfc-serial
  7) oneapi/eng-compiler/2024.07.30.002    22) libbsd/0.12.1-dsshygz          37) ruby-cast/0.3.1-3kwxnzj
  8) libfabric/1.20.1                      23) expat/2.6.2-s3fkrly            38) ruby-cast-to-yaml/0.1.1-5dhftgq
  9) cray-pals/1.4.0                       24) python/3.10.13                 39) ruby-mini-portile2/2.6.1-zbqteay
 10) cray-libpals/1.4.0                    25) glib/2.78.3-lpcguoz            40) ruby-nokogiri/1.12.5-3x7wfrs
 11) lz4/1.9.4                             26) babeltrace2/2.0.6-w37vov2      41) ruby-metababel/1.1.2-6o367to
 12) libarchive/3.7.1-fvef5p2              27) lttng-tools/2.12.11            42) gmake/4.4.1
 13) libiconv/1.17-kg7cda7                 28) abseil-cpp/20240116.2-cihlltz  43) hwloc/2.9.2-level-zero
 14) libmicrohttpd/0.9.50-jjjslhm          29) protobuf/3.27.1                44) yaksa/0.3-fxpciid
 15) sqlite/3.43.2-2onu5lp                 30) ruby/2.7.2-w7it2ky             45) mpich/opt/git.063ef64