In the PyTorch Traces, an event's "args" field can store various attributes of the underlying operators. These attributes are useful for performing a detailed trace analysis, such as establishing the relations between tensor input and operator latency. However, the current HTA implementation only exposes features which are used by the existing analyzers to minimize memory footprint. To support more analysis features while keep efficient memory usage, we propose to support user customization for the list of event attributes stored in the trace DataFrame.
Description
The initial trace representation only expose a small set of the argument attributes, as shown in the following table.
index
name
pid
tid
ts
cat
dur
stream
correlation
Trace iteration
memory_bw_gbps
index_correlation
iteration
23850
23850
64
715861
1658948133801383
110
1068
-1
-1
-1
0
-1
-1
23851
23851
199
715861
1658948133802396
110
40
-1
-1
-1
0
-1
-1
23852
23852
243
715861
1658948133802450
110
0
-1
-1
-1
0
-1
-1
23853
23853
35
715861
1658948133802479
110
541820
-1
-1
-1
0
-1
15
23854
23854
199
715861
1658948133802483
110
11
-1
-1
-1
0
-1
15
For the new implementation, the user can specify which additional argument attributes should be saved into the trace DataFrame, an example is shown in the following table.
index
name
pid
tid
ts
cat
dur
block
blocks_per_sm
bytes
cbid
context
correlation
est_occupancy
fwd_thread_id
grid
input_dims
input_type
memory_bw_gbps
registers_per_thread
shared_memory
stream
warps_per_sm
index_correlation
iteration
23850
23850
64
715861
1658948133801383
110
1068
-1
-1.0
-1
-1
-1
-1
-1.0
-1
-1
[[], [], [], [], []]
[GenericList, Int, , , Bool]
-1.0
-1
-1
-1
-1.0
-1
-1
23851
23851
199
715861
1658948133802396
110
40
-1
-1.0
-1
-1
-1
-1
-1.0
-1
-1
[[], [], [], [], [], []]
[GenericList, Int, , , Bool, ]
-1.0
-1
-1
-1
-1.0
-1
-1
23852
23852
243
715861
1658948133802450
110
0
-1
-1.0
-1
-1
-1
-1
-1.0
-1
-1
[[1]]
[float]
-1.0
-1
-1
-1
-1.0
-1
-1
23853
23853
35
715861
1658948133802479
110
541820
-1
-1.0
-1
-1
-1
-1
-1.0
-1
-1
-1
-1
-1.0
-1
-1
-1
-1.0
-1
15
23854
23854
199
715861
1658948133802483
110
11
-1
-1.0
-1
-1
-1
-1
-1.0
-1
-1
[[], [], [], [], [], []]
[GenericList, Int, , , , ]
-1.0
-1
-1
-1
-1.0
-1
15
Alternatives
As an alternative, we can expose all the event attributes. However, one downside of this approach is that it will incur much larger memory footprint while many attributes recorded in the trace may never been used by the analyzers.
Additional context
The proposed changes will not impact existing functionalities. All existing analyzers should work without change.
For new analyzers that use more attributes than what provides by the default setting, it is recommended to insert the following code before parsing any trace file.
from hta.configs.parser_config import ParserConfig
cfg = ParserConfig.get_default_cfg()
cfg.add_args(...)
ParserConfig.set_default_cfg(cfg)
# The typical trace analyzer code.
It's important to note that including additional trace attributes will require more memory. This can be an issue when an analyzer needs to process a large number of trace files associated with a distribute job. Thus, an analyzer should set an appropriate attribute list and pay attention to its runtime memory demand.
🚀 Motivation and context
In the PyTorch Traces, an event's "args" field can store various attributes of the underlying operators. These attributes are useful for performing a detailed trace analysis, such as establishing the relations between tensor input and operator latency. However, the current HTA implementation only exposes features which are used by the existing analyzers to minimize memory footprint. To support more analysis features while keep efficient memory usage, we propose to support user customization for the list of event attributes stored in the trace DataFrame.
Description
The initial trace representation only expose a small set of the argument attributes, as shown in the following table.
For the new implementation, the user can specify which additional argument attributes should be saved into the trace DataFrame, an example is shown in the following table.
Alternatives
As an alternative, we can expose all the event attributes. However, one downside of this approach is that it will incur much larger memory footprint while many attributes recorded in the trace may never been used by the analyzers.
Additional context
The proposed changes will not impact existing functionalities. All existing analyzers should work without change.
For new analyzers that use more attributes than what provides by the default setting, it is recommended to insert the following code before parsing any trace file.
It's important to note that including additional trace attributes will require more memory. This can be an issue when an analyzer needs to process a large number of trace files associated with a distribute job. Thus, an analyzer should set an appropriate attribute list and pay attention to its runtime memory demand.