argonne-lcf / THAPI

A tracing infrastructure for heterogeneous computing applications.
Other
22 stars 9 forks source link

Discard medatata #263

Closed TApplencourt closed 2 months ago

TApplencourt commented 2 months ago

@colleeneb have a MPI job (512) where the final reply of aggregations take for ever. I suspect if due to too much metadata (she have million of different size of the MPI traffic)

By default let not trace the medatada, but keep them when using k

THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:22+00:00
      zeCommandListAppendMemoryCopy | 724.56us |   0.37% |    15 |  48.30us |  10.23us | 489.08us |     0 |
zeCommandListAppendMemoryCopy(S2M) |  8.96us |  12.61% |     4 |   2.24us |  2.08us |  2.40us |
zeCommandListAppendMemoryCopy(M2D) |  5.68us |   8.00% |     7 | 811.43ns |    80ns |  2.88us |
zeCommandListAppendMemoryCopy(D2H) |  2.72us |   3.83% |     1 |   2.72us |  2.72us |  2.72us |
zeCommandListAppendMemoryCopy(M2M) |  2.56us |   3.60% |     1 |   2.56us |  2.56us |  2.56us |
zeCommandListAppendMemoryCopy(M2S) |   160ns |   0.23% |     1 | 160.00ns |   160ns |   160ns |
zeCommandListAppendMemoryCopy(H2D) |    80ns |   0.11% |     1 |  80.00ns |    80ns |    80ns |
zeCommandListAppendMemoryCopy(M2D) | 4.71kB |   0.14% |     7 |  673.14B |  4B | 4.10kB |
zeCommandListAppendMemoryCopy(S2M) |    96B |   0.00% |     4 |   24.00B |  8B |    40B |
zeCommandListAppendMemoryCopy(M2S) |    64B |   0.00% |     1 |   64.00B | 64B |    64B |
zeCommandListAppendMemoryCopy(M2M) |    56B |   0.00% |     1 |   56.00B | 56B |    56B |
zeCommandListAppendMemoryCopy(D2H) |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
zeCommandListAppendMemoryCopy(H2D) |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -r  | grep "zeCommandListAppendMemoryCopy"
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:22+00:00
      zeCommandListAppendMemoryCopy | 724.56us |   0.37% |    15 |  48.30us |  10.23us | 489.08us |     0 |
zeCommandListAppendMemoryCopy(S2M) |  8.96us |  12.61% |     4 |   2.24us |  2.08us |  2.40us |
zeCommandListAppendMemoryCopy(M2D) |  5.68us |   8.00% |     7 | 811.43ns |    80ns |  2.88us |
zeCommandListAppendMemoryCopy(D2H) |  2.72us |   3.83% |     1 |   2.72us |  2.72us |  2.72us |
zeCommandListAppendMemoryCopy(M2M) |  2.56us |   3.60% |     1 |   2.56us |  2.56us |  2.56us |
zeCommandListAppendMemoryCopy(M2S) |   160ns |   0.23% |     1 | 160.00ns |   160ns |   160ns |
zeCommandListAppendMemoryCopy(H2D) |    80ns |   0.11% |     1 |  80.00ns |    80ns |    80ns |
zeCommandListAppendMemoryCopy(M2D) | 4.71kB |   0.14% |     7 |  673.14B |  4B | 4.10kB |
zeCommandListAppendMemoryCopy(S2M) |    96B |   0.00% |     4 |   24.00B |  8B |    40B |
zeCommandListAppendMemoryCopy(M2S) |    64B |   0.00% |     1 |   64.00B | 64B |    64B |
zeCommandListAppendMemoryCopy(M2M) |    56B |   0.00% |     1 |   56.00B | 56B |    56B |
zeCommandListAppendMemoryCopy(D2H) |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
zeCommandListAppendMemoryCopy(H2D) |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -k -r  | grep "zeCommandListAppendMemoryCopy"
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:22+00:00
      zeCommandListAppendMemoryCopy | 724.56us |   0.37% |    15 |  48.30us |  10.23us | 489.08us |     0 |
zeCommandListAppendMemoryCopy(S2M)[discarded_metadata] |  8.96us |  12.61% |     4 |   2.24us |  2.08us |  2.40us |
zeCommandListAppendMemoryCopy(M2D)[discarded_metadata] |  5.68us |   8.00% |     7 | 811.43ns |    80ns |  2.88us |
zeCommandListAppendMemoryCopy(D2H)[discarded_metadata] |  2.72us |   3.83% |     1 |   2.72us |  2.72us |  2.72us |
zeCommandListAppendMemoryCopy(M2M)[discarded_metadata] |  2.56us |   3.60% |     1 |   2.56us |  2.56us |  2.56us |
zeCommandListAppendMemoryCopy(M2S)[discarded_metadata] |   160ns |   0.23% |     1 | 160.00ns |   160ns |   160ns |
zeCommandListAppendMemoryCopy(H2D)[discarded_metadata] |    80ns |   0.11% |     1 |  80.00ns |    80ns |    80ns |
zeCommandListAppendMemoryCopy(M2D)[discarded_metadata] | 4.71kB |   0.14% |     7 |  673.14B |  4B | 4.10kB |
zeCommandListAppendMemoryCopy(S2M)[discarded_metadata] |    96B |   0.00% |     4 |   24.00B |  8B |    40B |
zeCommandListAppendMemoryCopy(M2S)[discarded_metadata] |    64B |   0.00% |     1 |   64.00B | 64B |    64B |
zeCommandListAppendMemoryCopy(M2M)[discarded_metadata] |    56B |   0.00% |     1 |   56.00B | 56B |    56B |
zeCommandListAppendMemoryCopy(D2H)[discarded_metadata] |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
zeCommandListAppendMemoryCopy(H2D)[discarded_metadata] |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -k ./a.out  | grep "zeCommandListAppendMemoryCopy"
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:43+00:00
      zeCommandListAppendMemoryCopy | 662.72us |   0.35% |    15 |  44.18us |   9.88us | 443.07us |     0 |
        zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] |  8.64us |  12.04% |     4 |   2.16us |  2.08us |  2.24us |
        zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] |  5.52us |   7.69% |     7 | 788.57ns |    80ns |  2.72us |
        zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] |  3.04us |   4.24% |     1 |   3.04us |  3.04us |  3.04us |
        zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] |  2.80us |   3.90% |     1 |   2.80us |  2.80us |  2.80us |
        zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] |   160ns |   0.22% |     1 | 160.00ns |   160ns |   160ns |
        zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] |    80ns |   0.11% |     1 |  80.00ns |    80ns |    80ns |
zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] | 4.71kB |   0.14% |     7 |  673.14B |  4B | 4.10kB |
zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] |    96B |   0.00% |     4 |   24.00B |  8B |    40B |
zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] |    64B |   0.00% |     1 |   64.00B | 64B |    64B |
zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] |    56B |   0.00% |     1 |   56.00B | 56B |    56B |
zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -r  | grep "zeCommandListAppendMemoryCopy"
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:43+00:00
      zeCommandListAppendMemoryCopy | 662.72us |   0.35% |    15 |  44.18us |   9.88us | 443.07us |     0 |
zeCommandListAppendMemoryCopy(S2M) |  8.64us |  12.04% |     4 |   2.16us |  2.08us |  2.24us |
zeCommandListAppendMemoryCopy(M2D) |  5.52us |   7.69% |     7 | 788.57ns |    80ns |  2.72us |
zeCommandListAppendMemoryCopy(D2H) |  3.04us |   4.24% |     1 |   3.04us |  3.04us |  3.04us |
zeCommandListAppendMemoryCopy(M2M) |  2.80us |   3.90% |     1 |   2.80us |  2.80us |  2.80us |
zeCommandListAppendMemoryCopy(M2S) |   160ns |   0.22% |     1 | 160.00ns |   160ns |   160ns |
zeCommandListAppendMemoryCopy(H2D) |    80ns |   0.11% |     1 |  80.00ns |    80ns |    80ns |
zeCommandListAppendMemoryCopy(M2D) | 4.71kB |   0.14% |     7 |  673.14B |  4B | 4.10kB |
zeCommandListAppendMemoryCopy(S2M) |    96B |   0.00% |     4 |   24.00B |  8B |    40B |
zeCommandListAppendMemoryCopy(M2S) |    64B |   0.00% |     1 |   64.00B | 64B |    64B |
zeCommandListAppendMemoryCopy(M2M) |    56B |   0.00% |     1 |   56.00B | 56B |    56B |
zeCommandListAppendMemoryCopy(D2H) |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
zeCommandListAppendMemoryCopy(H2D) |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -r -k  | grep "zeCommandListAppendMemoryCopy"
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:43+00:00
      zeCommandListAppendMemoryCopy | 662.72us |   0.35% |    15 |  44.18us |   9.88us | 443.07us |     0 |
        zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] |  8.64us |  12.04% |     4 |   2.16us |  2.08us |  2.24us |
        zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] |  5.52us |   7.69% |     7 | 788.57ns |    80ns |  2.72us |
        zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] |  3.04us |   4.24% |     1 |   3.04us |  3.04us |  3.04us |
        zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] |  2.80us |   3.90% |     1 |   2.80us |  2.80us |  2.80us |
        zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] |   160ns |   0.22% |     1 | 160.00ns |   160ns |   160ns |
        zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] |    80ns |   0.11% |     1 |  80.00ns |    80ns |    80ns |
zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] | 4.71kB |   0.14% |     7 |  673.14B |  4B | 4.10kB |
zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] |    96B |   0.00% |     4 |   24.00B |  8B |    40B |
zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] |    64B |   0.00% |     1 |   64.00B | 64B |    64B |
zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] |    56B |   0.00% |     1 |   56.00B | 56B |    56B |
zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] |     4B |   0.00% |     1 |    4.00B |  4B |     4B |
TApplencourt commented 2 months ago

This was not forked from the master >< Will do that after lunch...