Expose more event features in trace representation

🚀 Motivation and context

In the PyTorch Traces, an event's "args" field can store various attributes of the underlying operators. These attributes are useful for performing a detailed trace analysis, such as establishing the relations between tensor input and operator latency. However, the current HTA implementation only exposes features which are used by the existing analyzers to minimize memory footprint. To support more analysis features while keep efficient memory usage, we propose to support user customization for the list of event attributes stored in the trace DataFrame.

Description

The initial trace representation only expose a small set of the argument attributes, as shown in the following table.

index	name	pid	tid	ts	cat	dur	stream	correlation	Trace iteration	index_correlation	iteration
23850	23850	64	715861	1658948133801383	110	1068	-1	-1	-1	-1	-1
23851	23851	199	715861	1658948133802396	110	40	-1	-1	-1	-1	-1
23852	23852	243	715861	1658948133802450	110	0	-1	-1	-1	-1	-1
23853	23853	35	715861	1658948133802479	110	541820	-1	-1	-1	-1	15
23854	23854	199	715861	1658948133802483	110	11	-1	-1	-1	-1	15

For the new implementation, the user can specify which additional argument attributes should be saved into the trace DataFrame, an example is shown in the following table.

index	name	pid	tid	ts	cat	dur	block	blocks_per_sm	bytes	cbid	context	correlation	est_occupancy	fwd_thread_id	grid	input_dims	input_type	memory_bw_gbps	registers_per_thread	shared_memory	stream	warps_per_sm	index_correlation	iteration
23850	23850	64	715861	1658948133801383	110	1068	-1	-1.0	-1	-1	-1	-1	-1.0	-1	-1	[[], [], [], [], []]	[GenericList, Int, , , Bool]	-1.0	-1	-1	-1	-1.0	-1	-1
23851	23851	199	715861	1658948133802396	110	40	-1	-1.0	-1	-1	-1	-1	-1.0	-1	-1	[[], [], [], [], [], []]	[GenericList, Int, , , Bool, ]	-1.0	-1	-1	-1	-1.0	-1	-1
23852	23852	243	715861	1658948133802450	110	0	-1	-1.0	-1	-1	-1	-1	-1.0	-1	-1	[[1]]	[float]	-1.0	-1	-1	-1	-1.0	-1	-1
23853	23853	35	715861	1658948133802479	110	541820	-1	-1.0	-1	-1	-1	-1	-1.0	-1	-1	-1	-1	-1.0	-1	-1	-1	-1.0	-1	15
23854	23854	199	715861	1658948133802483	110	11	-1	-1.0	-1	-1	-1	-1	-1.0	-1	-1	[[], [], [], [], [], []]	[GenericList, Int, , , , ]	-1.0	-1	-1	-1	-1.0	-1	15

Alternatives

As an alternative, we can expose all the event attributes. However, one downside of this approach is that it will incur much larger memory footprint while many attributes recorded in the trace may never been used by the analyzers.

Additional context

The proposed changes will not impact existing functionalities. All existing analyzers should work without change.

For new analyzers that use more attributes than what provides by the default setting, it is recommended to insert the following code before parsing any trace file.

from hta.configs.parser_config import ParserConfig

cfg = ParserConfig.get_default_cfg()
cfg.add_args(...)
ParserConfig.set_default_cfg(cfg)

# The typical trace analyzer code.

It's important to note that including additional trace attributes will require more memory. This can be an issue when an analyzer needs to process a large number of trace files associated with a distribute job. Thus, an analyzer should set an appropriate attribute list and pay attention to its runtime memory demand.

facebookresearch / HolisticTraceAnalysis