facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.
http://hta.readthedocs.io
MIT License
307 stars 43 forks source link

Expose more event features in trace representation #30

Closed fengxizhou closed 1 year ago

fengxizhou commented 1 year ago

🚀 Motivation and context

In the PyTorch Traces, an event's "args" field can store various attributes of the underlying operators. These attributes are useful for performing a detailed trace analysis, such as establishing the relations between tensor input and operator latency. However, the current HTA implementation only exposes features which are used by the existing analyzers to minimize memory footprint. To support more analysis features while keep efficient memory usage, we propose to support user customization for the list of event attributes stored in the trace DataFrame.

Description

The initial trace representation only expose a small set of the argument attributes, as shown in the following table.

index name pid tid ts cat dur stream correlation Trace iteration memory_bw_gbps index_correlation iteration
23850 23850 64 715861 1658948133801383 110 1068 -1 -1 -1 0 -1 -1
23851 23851 199 715861 1658948133802396 110 40 -1 -1 -1 0 -1 -1
23852 23852 243 715861 1658948133802450 110 0 -1 -1 -1 0 -1 -1
23853 23853 35 715861 1658948133802479 110 541820 -1 -1 -1 0 -1 15
23854 23854 199 715861 1658948133802483 110 11 -1 -1 -1 0 -1 15

For the new implementation, the user can specify which additional argument attributes should be saved into the trace DataFrame, an example is shown in the following table.

index name pid tid ts cat dur block blocks_per_sm bytes cbid context correlation est_occupancy fwd_thread_id grid input_dims input_type memory_bw_gbps registers_per_thread shared_memory stream warps_per_sm index_correlation iteration
23850 23850 64 715861 1658948133801383 110 1068 -1 -1.0 -1 -1 -1 -1 -1.0 -1 -1 [[], [], [], [], []] [GenericList, Int, , , Bool] -1.0 -1 -1 -1 -1.0 -1 -1
23851 23851 199 715861 1658948133802396 110 40 -1 -1.0 -1 -1 -1 -1 -1.0 -1 -1 [[], [], [], [], [], []] [GenericList, Int, , , Bool, ] -1.0 -1 -1 -1 -1.0 -1 -1
23852 23852 243 715861 1658948133802450 110 0 -1 -1.0 -1 -1 -1 -1 -1.0 -1 -1 [[1]] [float] -1.0 -1 -1 -1 -1.0 -1 -1
23853 23853 35 715861 1658948133802479 110 541820 -1 -1.0 -1 -1 -1 -1 -1.0 -1 -1 -1 -1 -1.0 -1 -1 -1 -1.0 -1 15
23854 23854 199 715861 1658948133802483 110 11 -1 -1.0 -1 -1 -1 -1 -1.0 -1 -1 [[], [], [], [], [], []] [GenericList, Int, , , , ] -1.0 -1 -1 -1 -1.0 -1 15

Alternatives

As an alternative, we can expose all the event attributes. However, one downside of this approach is that it will incur much larger memory footprint while many attributes recorded in the trace may never been used by the analyzers.

Additional context

The proposed changes will not impact existing functionalities. All existing analyzers should work without change.

For new analyzers that use more attributes than what provides by the default setting, it is recommended to insert the following code before parsing any trace file.

from hta.configs.parser_config import ParserConfig

cfg = ParserConfig.get_default_cfg()
cfg.add_args(...)
ParserConfig.set_default_cfg(cfg)

# The typical trace analyzer code.

It's important to note that including additional trace attributes will require more memory. This can be an issue when an analyzer needs to process a large number of trace files associated with a distribute job. Thus, an analyzer should set an appropriate attribute list and pay attention to its runtime memory demand.

anupambhatnagar commented 1 year ago

Sounds good. This should be useful to users.