facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
MIT License
188 stars 34 forks source link

Pytorch example does not work without GPU #208

Closed cameron-martin closed 4 months ago

cameron-martin commented 5 months ago

After setting up dynolog to --enable_ipc_monitor, I have tried running the example (but changing the device to CPU) like so:

KINETO_USE_DAEMON=1 python scripts/pytorch/linear_model_example.py

Then if I run dyno gputrace, I get the following:

No processes were matched, please check --job-id or --pids flags

If I wrap the example in a profiler, then I do get matched processes. However, I thought the point what that it required no code modifications?

with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:

This does create a trace, but it doesn't contain much useful info. See attached file.

dynolog.json

What am I doing wrong?

jj10306 commented 5 months ago

@cameron-martin

but changing the device to CPU

dyno gputrace only supports tracing Pytorch executing on GPU

cc: @briancoutinho to confirm

cameron-martin commented 5 months ago

Why does it require a GPU if both dynolog and kineto claim to support CPU profiling?

cameron-martin commented 5 months ago

I looks like libkineto_init handles cupti not being available gracefully. I'm failing to see what is causing this to fail, but I'll keep digging.

briancoutinho commented 5 months ago

@cameron-martin Yes, actually dynolog and kineto support CPU-only profiling too. Tbh we didn't test the on-demand tracing flow on a pure CPU version of PyTorch. I'll give it a try with a latest release of cpu torch, but please do share the versions you used as well.

You are right that libkineto_init() is where the actual registry happens with dynolog. The libkineto_init() is invoked in multiple places.

  1. During strartup in profiler registry here.
  2. Lazily called during first usage of PyTorch profiler here.

Actually we wanted (1) to always be invoked but i think some define is probably compiling it out like this ENABLE_GLOBAL_OBSERVER thing. Any chance you are using an Apple system or PyTorch edge?

#if defined(__APPLE__) || defined(EDGE_PROFILER_USE_KINETO)
#define ENABLE_GLOBAL_OBSERVER (0)
#else
cameron-martin commented 5 months ago

I'm using:

I also compiled HEAD of dynolog and tested but still get the same results.

briancoutinho commented 5 months ago

@cameron-martin thanks for the info, I think the issue is in the PyTorch side as in libkineto is not getting initialized. I'll try to repro the CPU only setup and get back to you.

briancoutinho commented 5 months ago

Working on two WIP fixes https://github.com/pytorch/kineto/pull/861 (have to land this first) https://github.com/pytorch/pytorch/pull/118320

briancoutinho commented 5 months ago

Ok all fixes are in https://github.com/pytorch/pytorch/pull/118320 :) Can you try out PyTorch nightly build to see if this works, I tried it out for developing the PR. Let me know..

cameron-martin commented 4 months ago

Just tested this with nightly torch and it works great, thanks!