facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
MIT License
260 stars 38 forks source link

compatibility with torch-tb-profiler with dynologger traces #155

Closed asaiacai closed 7 months ago

asaiacai commented 1 year ago

tl;dr

Traces are currently incompatible with the latest version of the torch tensorboard profiler

Summary

It'd be nice to interact with trace files through torch-tb-profiler, the same interface for offline profiling as on-demand profiling. Right now the trace files can only be opened in the chrome tracer but this is limited to one trace at time. The aggregations provided in torch-tb-profiler and HolisticTraceAnalyzer are good for getting a birds eye view of performance over many GPUs when doing distributed training, but do not work with the current on-demand trace outputs. The following info is missing from the on-demand traces.

  "distributedInfo": {"backend": "nccl", "rank": 4, "world_size": 12},

Also, if it's possible to toggle memory, shape size, and stack tracing through dynolog, that would basically enable all the same analysis in torch-tb-profiler for on-demand as there are for offline tracing.

Environment

briancoutinho commented 1 year ago

@asaiacai thanks for posting this 👍 Let me first cover the part of toggling memory, shapes and stack tracing.

Additional CLI options for on-demand profiling

Also, if it's possible to toggle memory, shape size, and stack tracing through dynolog, that would basically enable all the same analysis in torch-tb-profiler for on-demand as there are for offline tracing.

Yes this is a feature we should add. To clarify the dyno tool basically sends a kineto configuration on the wire (over the network) to dynolog daemon. From here the config is passed to Kineto when it polls dynolog.

The config is generated here - https://github.com/facebookincubator/dynolog/blob/main/cli/src/commands/gputrace.rs#L40

There are two features we can add:

  1. User passes a generic config file dyno gputrace --config-file ./kineto.conf, the lines in the provided file should be appended to the config and also sent down the wire. This will enable setting any custom kineto configurations. It might be great to also have a key value pair directly on command line dyno gputrace --config-opts 'PROFILE_MEMORY=YES'
  2. We should add direct options on the CLI to enable stack, memory and shapes profiling. It would be something like --enable-stacks or --enable-memory-profiler.

In our Meta internal CLI we do have 1 and 2 already, its different code (also not my favorite as its in C++). It would be great to add these to open source one. I'll try to post a PR to do 1. at least this week. Will keep you posted.

Format parity between on-demand and PyTorch profiler initiated traces.

Yes the distributed info thing is missing :( I also tend to just add a string like that in the json, so I can load it in HTA. This needs to be resolve in Kineto. @aaronenyeshi do you know how much effort that would be to add the following on on-demand traces? "distributedInfo": {"backend": "nccl", "rank": 4, "world_size": 12},

briancoutinho commented 1 year ago

@asaiacai PR https://github.com/facebookincubator/dynolog/pull/159 is adding the options you need directly. Will land in a few days.

briancoutinho commented 1 year ago

@asaiacai the CLI change have landed so now you can collect memory profiles, python stacks etc. Please try out the latest version by building from source. To build from source you can follow - https://github.com/facebookincubator/dynolog/tree/main#building-from-source

Or if you are using docker then use the dockerfile in the repo by following instructions in Release.md https://github.com/facebookincubator/dynolog/blob/main/RELEASE.md#building-release-packages-using-docker

Will put a new release with this in a few days. cheers!

briancoutinho commented 1 year ago

PS: I think we need to move the distributed info issue to PyTorch/kineto

JingshuXia commented 11 months ago

Hi @briancoutinho, I built dynolog v0.3.1 from source and ran the dyno gputrace --record-shapes --profile-memory --with-stacks --with-modules --log-file /tmp/pytorch_trace.json and Kineto config outputs with PROFILE_PROFILE_MEMORY=true but the trace output doesn't contain memory info. Loading the output with TensorBoard also doesn't show the Memory view. I am on pytorch version 2.0.0+cu117 and tensorboard 2.13.0. Does any flag need to be added under /etc/dynolog.gflags to enable it?

briancoutinho commented 11 months ago

The config looks good to me. cc @aaronenyeshi is there nay versioning requirements on PyTorch for memory profiler

JingshuXia commented 10 months ago

Upgrading to Pytorch 2.1.0+cu121 and tensorboard 2.15.1 does help. Any suggestions here?

susiexia-nflx commented 10 months ago

This open issue seems related to the missing Memory view: https://github.com/pytorch/kineto/issues/701

briancoutinho commented 7 months ago

I'm closing this issue and ask that we move the discussion to kineto now https://github.com/pytorch/kineto/issues/889