Closed asaiacai closed 7 months ago
@asaiacai thanks for posting this 👍 Let me first cover the part of toggling memory, shapes and stack tracing.
Also, if it's possible to toggle memory, shape size, and stack tracing through dynolog, that would basically enable all the same analysis in torch-tb-profiler for on-demand as there are for offline tracing.
Yes this is a feature we should add. To clarify the dyno tool basically sends a kineto configuration on the wire (over the network) to dynolog daemon. From here the config is passed to Kineto when it polls dynolog.
The config is generated here - https://github.com/facebookincubator/dynolog/blob/main/cli/src/commands/gputrace.rs#L40
There are two features we can add:
dyno gputrace --config-file ./kineto.conf
, the lines in the provided file should be appended to the config and also sent down the wire. This will enable setting any custom kineto configurations. It might be great to also have a key value pair directly on command line dyno gputrace --config-opts 'PROFILE_MEMORY=YES'
--enable-stacks
or --enable-memory-profiler
.In our Meta internal CLI we do have 1 and 2 already, its different code (also not my favorite as its in C++). It would be great to add these to open source one. I'll try to post a PR to do 1. at least this week. Will keep you posted.
Yes the distributed info thing is missing :( I also tend to just add a string like that in the json, so I can load it in HTA. This needs to be resolve in Kineto. @aaronenyeshi do you know how much effort that would be to add the following on on-demand traces?
"distributedInfo": {"backend": "nccl", "rank": 4, "world_size": 12},
@asaiacai PR https://github.com/facebookincubator/dynolog/pull/159 is adding the options you need directly. Will land in a few days.
@asaiacai the CLI change have landed so now you can collect memory profiles, python stacks etc. Please try out the latest version by building from source. To build from source you can follow - https://github.com/facebookincubator/dynolog/tree/main#building-from-source
Or if you are using docker then use the dockerfile in the repo by following instructions in Release.md https://github.com/facebookincubator/dynolog/blob/main/RELEASE.md#building-release-packages-using-docker
Will put a new release with this in a few days. cheers!
PS: I think we need to move the distributed info issue to PyTorch/kineto
Hi @briancoutinho, I built dynolog v0.3.1 from source and ran the dyno gputrace --record-shapes --profile-memory --with-stacks --with-modules --log-file /tmp/pytorch_trace.json
and Kineto config outputs with PROFILE_PROFILE_MEMORY=true
but the trace output doesn't contain memory info. Loading the output with TensorBoard also doesn't show the Memory view
. I am on pytorch version 2.0.0+cu117
and tensorboard 2.13.0
. Does any flag need to be added under /etc/dynolog.gflags
to enable it?
The config looks good to me. cc @aaronenyeshi is there nay versioning requirements on PyTorch for memory profiler
Upgrading to Pytorch 2.1.0+cu121 and tensorboard 2.15.1 does help. Any suggestions here?
This open issue seems related to the missing Memory view: https://github.com/pytorch/kineto/issues/701
I'm closing this issue and ask that we move the discussion to kineto now https://github.com/pytorch/kineto/issues/889
tl;dr
Traces are currently incompatible with the latest version of the torch tensorboard profiler
Summary
It'd be nice to interact with trace files through
torch-tb-profiler
, the same interface for offline profiling as on-demand profiling. Right now the trace files can only be opened in the chrome tracer but this is limited to one trace at time. The aggregations provided intorch-tb-profiler
andHolisticTraceAnalyzer
are good for getting a birds eye view of performance over many GPUs when doing distributed training, but do not work with the current on-demand trace outputs. The following info is missing from the on-demand traces.Also, if it's possible to toggle memory, shape size, and stack tracing through dynolog, that would basically enable all the same analysis in
torch-tb-profiler
for on-demand as there are for offline tracing.Environment