Closed TApplencourt closed 2 months ago
@colleeneb have a MPI job (512) where the final reply of aggregations take for ever. I suspect if due to too much metadata (she have million of different size of the MPI traffic)
By default let not trace the medatada, but keep them when using k
k
THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:22+00:00 zeCommandListAppendMemoryCopy | 724.56us | 0.37% | 15 | 48.30us | 10.23us | 489.08us | 0 | zeCommandListAppendMemoryCopy(S2M) | 8.96us | 12.61% | 4 | 2.24us | 2.08us | 2.40us | zeCommandListAppendMemoryCopy(M2D) | 5.68us | 8.00% | 7 | 811.43ns | 80ns | 2.88us | zeCommandListAppendMemoryCopy(D2H) | 2.72us | 3.83% | 1 | 2.72us | 2.72us | 2.72us | zeCommandListAppendMemoryCopy(M2M) | 2.56us | 3.60% | 1 | 2.56us | 2.56us | 2.56us | zeCommandListAppendMemoryCopy(M2S) | 160ns | 0.23% | 1 | 160.00ns | 160ns | 160ns | zeCommandListAppendMemoryCopy(H2D) | 80ns | 0.11% | 1 | 80.00ns | 80ns | 80ns | zeCommandListAppendMemoryCopy(M2D) | 4.71kB | 0.14% | 7 | 673.14B | 4B | 4.10kB | zeCommandListAppendMemoryCopy(S2M) | 96B | 0.00% | 4 | 24.00B | 8B | 40B | zeCommandListAppendMemoryCopy(M2S) | 64B | 0.00% | 1 | 64.00B | 64B | 64B | zeCommandListAppendMemoryCopy(M2M) | 56B | 0.00% | 1 | 56.00B | 56B | 56B | zeCommandListAppendMemoryCopy(D2H) | 4B | 0.00% | 1 | 4.00B | 4B | 4B | zeCommandListAppendMemoryCopy(H2D) | 4B | 0.00% | 1 | 4.00B | 4B | 4B | applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -r | grep "zeCommandListAppendMemoryCopy" THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:22+00:00 zeCommandListAppendMemoryCopy | 724.56us | 0.37% | 15 | 48.30us | 10.23us | 489.08us | 0 | zeCommandListAppendMemoryCopy(S2M) | 8.96us | 12.61% | 4 | 2.24us | 2.08us | 2.40us | zeCommandListAppendMemoryCopy(M2D) | 5.68us | 8.00% | 7 | 811.43ns | 80ns | 2.88us | zeCommandListAppendMemoryCopy(D2H) | 2.72us | 3.83% | 1 | 2.72us | 2.72us | 2.72us | zeCommandListAppendMemoryCopy(M2M) | 2.56us | 3.60% | 1 | 2.56us | 2.56us | 2.56us | zeCommandListAppendMemoryCopy(M2S) | 160ns | 0.23% | 1 | 160.00ns | 160ns | 160ns | zeCommandListAppendMemoryCopy(H2D) | 80ns | 0.11% | 1 | 80.00ns | 80ns | 80ns | zeCommandListAppendMemoryCopy(M2D) | 4.71kB | 0.14% | 7 | 673.14B | 4B | 4.10kB | zeCommandListAppendMemoryCopy(S2M) | 96B | 0.00% | 4 | 24.00B | 8B | 40B | zeCommandListAppendMemoryCopy(M2S) | 64B | 0.00% | 1 | 64.00B | 64B | 64B | zeCommandListAppendMemoryCopy(M2M) | 56B | 0.00% | 1 | 56.00B | 56B | 56B | zeCommandListAppendMemoryCopy(D2H) | 4B | 0.00% | 1 | 4.00B | 4B | 4B | zeCommandListAppendMemoryCopy(H2D) | 4B | 0.00% | 1 | 4.00B | 4B | 4B | applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -k -r | grep "zeCommandListAppendMemoryCopy" THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:22+00:00 zeCommandListAppendMemoryCopy | 724.56us | 0.37% | 15 | 48.30us | 10.23us | 489.08us | 0 | zeCommandListAppendMemoryCopy(S2M)[discarded_metadata] | 8.96us | 12.61% | 4 | 2.24us | 2.08us | 2.40us | zeCommandListAppendMemoryCopy(M2D)[discarded_metadata] | 5.68us | 8.00% | 7 | 811.43ns | 80ns | 2.88us | zeCommandListAppendMemoryCopy(D2H)[discarded_metadata] | 2.72us | 3.83% | 1 | 2.72us | 2.72us | 2.72us | zeCommandListAppendMemoryCopy(M2M)[discarded_metadata] | 2.56us | 3.60% | 1 | 2.56us | 2.56us | 2.56us | zeCommandListAppendMemoryCopy(M2S)[discarded_metadata] | 160ns | 0.23% | 1 | 160.00ns | 160ns | 160ns | zeCommandListAppendMemoryCopy(H2D)[discarded_metadata] | 80ns | 0.11% | 1 | 80.00ns | 80ns | 80ns | zeCommandListAppendMemoryCopy(M2D)[discarded_metadata] | 4.71kB | 0.14% | 7 | 673.14B | 4B | 4.10kB | zeCommandListAppendMemoryCopy(S2M)[discarded_metadata] | 96B | 0.00% | 4 | 24.00B | 8B | 40B | zeCommandListAppendMemoryCopy(M2S)[discarded_metadata] | 64B | 0.00% | 1 | 64.00B | 64B | 64B | zeCommandListAppendMemoryCopy(M2M)[discarded_metadata] | 56B | 0.00% | 1 | 56.00B | 56B | 56B | zeCommandListAppendMemoryCopy(D2H)[discarded_metadata] | 4B | 0.00% | 1 | 4.00B | 4B | 4B | zeCommandListAppendMemoryCopy(H2D)[discarded_metadata] | 4B | 0.00% | 1 | 4.00B | 4B | 4B | applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -k ./a.out | grep "zeCommandListAppendMemoryCopy" THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:43+00:00 zeCommandListAppendMemoryCopy | 662.72us | 0.35% | 15 | 44.18us | 9.88us | 443.07us | 0 | zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] | 8.64us | 12.04% | 4 | 2.16us | 2.08us | 2.24us | zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] | 5.52us | 7.69% | 7 | 788.57ns | 80ns | 2.72us | zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] | 3.04us | 4.24% | 1 | 3.04us | 3.04us | 3.04us | zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] | 2.80us | 3.90% | 1 | 2.80us | 2.80us | 2.80us | zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] | 160ns | 0.22% | 1 | 160.00ns | 160ns | 160ns | zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] | 80ns | 0.11% | 1 | 80.00ns | 80ns | 80ns | zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] | 4.71kB | 0.14% | 7 | 673.14B | 4B | 4.10kB | zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] | 96B | 0.00% | 4 | 24.00B | 8B | 40B | zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] | 64B | 0.00% | 1 | 64.00B | 64B | 64B | zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] | 56B | 0.00% | 1 | 56.00B | 56B | 56B | zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] | 4B | 0.00% | 1 | 4.00B | 4B | 4B | zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] | 4B | 0.00% | 1 | 4.00B | 4B | 4B | applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -r | grep "zeCommandListAppendMemoryCopy" THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:43+00:00 zeCommandListAppendMemoryCopy | 662.72us | 0.35% | 15 | 44.18us | 9.88us | 443.07us | 0 | zeCommandListAppendMemoryCopy(S2M) | 8.64us | 12.04% | 4 | 2.16us | 2.08us | 2.24us | zeCommandListAppendMemoryCopy(M2D) | 5.52us | 7.69% | 7 | 788.57ns | 80ns | 2.72us | zeCommandListAppendMemoryCopy(D2H) | 3.04us | 4.24% | 1 | 3.04us | 3.04us | 3.04us | zeCommandListAppendMemoryCopy(M2M) | 2.80us | 3.90% | 1 | 2.80us | 2.80us | 2.80us | zeCommandListAppendMemoryCopy(M2S) | 160ns | 0.22% | 1 | 160.00ns | 160ns | 160ns | zeCommandListAppendMemoryCopy(H2D) | 80ns | 0.11% | 1 | 80.00ns | 80ns | 80ns | zeCommandListAppendMemoryCopy(M2D) | 4.71kB | 0.14% | 7 | 673.14B | 4B | 4.10kB | zeCommandListAppendMemoryCopy(S2M) | 96B | 0.00% | 4 | 24.00B | 8B | 40B | zeCommandListAppendMemoryCopy(M2S) | 64B | 0.00% | 1 | 64.00B | 64B | 64B | zeCommandListAppendMemoryCopy(M2M) | 56B | 0.00% | 1 | 56.00B | 56B | 56B | zeCommandListAppendMemoryCopy(D2H) | 4B | 0.00% | 1 | 4.00B | 4B | 4B | zeCommandListAppendMemoryCopy(H2D) | 4B | 0.00% | 1 | 4.00B | 4B | 4B | applenco@x4315c6s0b0n0:~/slow> /home/applenco/THAPI/build/ici/bin/iprof -r -k | grep "zeCommandListAppendMemoryCopy" THAPI: Trace location: /home/applenco/thapi-traces/thapi_aggreg--2024-07-17T17:17:43+00:00 zeCommandListAppendMemoryCopy | 662.72us | 0.35% | 15 | 44.18us | 9.88us | 443.07us | 0 | zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] | 8.64us | 12.04% | 4 | 2.16us | 2.08us | 2.24us | zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] | 5.52us | 7.69% | 7 | 788.57ns | 80ns | 2.72us | zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] | 3.04us | 4.24% | 1 | 3.04us | 3.04us | 3.04us | zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] | 2.80us | 3.90% | 1 | 2.80us | 2.80us | 2.80us | zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] | 160ns | 0.22% | 1 | 160.00ns | 160ns | 160ns | zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] | 80ns | 0.11% | 1 | 80.00ns | 80ns | 80ns | zeCommandListAppendMemoryCopy(M2D)[{ordinal: 1, index: 0}] | 4.71kB | 0.14% | 7 | 673.14B | 4B | 4.10kB | zeCommandListAppendMemoryCopy(S2M)[{ordinal: 1, index: 0}] | 96B | 0.00% | 4 | 24.00B | 8B | 40B | zeCommandListAppendMemoryCopy(M2S)[{ordinal: 1, index: 0}] | 64B | 0.00% | 1 | 64.00B | 64B | 64B | zeCommandListAppendMemoryCopy(M2M)[{ordinal: 1, index: 0}] | 56B | 0.00% | 1 | 56.00B | 56B | 56B | zeCommandListAppendMemoryCopy(D2H)[{ordinal: 1, index: 0}] | 4B | 0.00% | 1 | 4.00B | 4B | 4B | zeCommandListAppendMemoryCopy(H2D)[{ordinal: 1, index: 0}] | 4B | 0.00% | 1 | 4.00B | 4B | 4B |
This was not forked from the master >< Will do that after lunch...
@colleeneb have a MPI job (512) where the final reply of aggregations take for ever. I suspect if due to too much metadata (she have million of different size of the MPI traffic)
By default let not trace the medatada, but keep them when using
k