Closed anupambhatnagar closed 8 months ago
Are you certain the application is hanging? Is there a way to check CPU activity in another console while the application is running? I ask because runtime instrumentation unfortunately tends to take a very long time because it ends up parsing not only your executable but every library linked to your executable, which is why I generally recommend binary rewrites if you don’t want to instrument the shared libraries linked to the executable. If you are unsure, it might help to just use omnitrace-run with sampling enabled on an uninstrumented executable to see if the backtraces show a lot of time being spent in the linked libraries
Thanks @jrmadsen for the prompt reply. I'll monitor the CPU activity to verify if it is running or hanging and also use omnitrace-run
.
I tried omnitrace-run on my binary and it kept running for over an hour at which point I exited using Ctrl-C. The binary I have is a basic triton kernel which executes in less than a couple of seconds with triton and pytorch. The build system I use (buck) packages everything together and generates a 700MB executable. Unfortunately, executing ldd
on the file says it is not a dynamic executable so I can't see the linked libraries.
I also tried omnitrace-run --enable-categories rocprofiler -- ./rms_norm.par
but it didn't help. Top show CPU utilization is 0.0%.
❯ omnitrace-run --enable-categories rocprofiler -- ./rms_norm.par
OMNITRACE: HSA_TOOLS_LIB=/home/anupamb/omnitrace/lib/libomnitrace-dl.so.1.11.0
OMNITRACE: HSA_TOOLS_REPORT_LOAD_FAILURE=1
OMNITRACE: LD_PRELOAD=/home/anupamb/omnitrace/lib/libomnitrace-dl.so.1.11.0
OMNITRACE: OMNITRACE_ENABLE_CATEGORIES=rocprofiler
OMNITRACE: OMP_TOOL_LIBRARIES=/home/anupamb/omnitrace/lib/libomnitrace-dl.so.1.11.0
OMNITRACE: ROCP_HSA_INTERCEPT=1
OMNITRACE: ROCP_TOOL_LIB=/home/anupamb/omnitrace/lib/libomnitrace.so.1.11.0
[omnitrace][dl][1292192] omnitrace_main
[omnitrace][1292192][omnitrace_init_tooling] Instrumentation mode: Sampling
______ .___ ___. .__ __. __ .___________..______ ___ ______ _______
/ __ \ | \/ | | \ | | | | | || _ \ / \ / || ____|
| | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__
| | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __|
| `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____
\______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______|
omnitrace v1.11.0 (rev: 77d52814e9050004cfb11d7917e155b00ab861b1, tag: v1.11.0, compiler: GNU v11.4.1, rocm: v6.0.x)
I was not aware this was a PyTorch app. If your executable is 700 MB, I’m not surprised Dyninst takes forever to parse the binary. You’ve clearly got a deadlock, sampling doesn’t slow down an app that runs in a couple of seconds to more than a minute or two. Are you executing on multiple GPUs? PyTorch RPATHs its own ROCm libraries (or in your case, it might statically link or dlopen them), this is not going to play nice with Omnitrace loading a different ROCm runtime.
Honestly, I’d probably install the omnitrace that doesn’t have support for ROCm. Until we complete our work on a new roctracer/rocprofiler implementation that doesn’t link to the HIP/HSA runtimes, there’s very little tools like Omnitrace can do for apps like PyTorch which have their own “hidden” ROCm distributions that they use bc it results in multiple ROCm runtimes being loaded.
I got omnitrace working with my triton kernel on MI300. To get it working, I built pytorch from source on MI300, installed triton-rocm and then ran omnitrace on my kernel. It worked flawlessly. Kudos to you for building this high quality software.
I will be diving deeper into it next week and will reach out if I have more questions, which I most likely will 😄 . I love the fact that you dump Perfetto compatible output.
Hi, I'm trying to instrument a binary application on MI300X with Omnitrace. To ensure that my installation is working I used the example script here to ensure that
omnitrace-instrument
andomnitrace-run
commands are working as expected. I'm able to generate the perfetto trace and view it.On my executable, Omnitrace launches and seems to hang. Here's the backtrace. Any suggestions to debug this would be highly appreciated. Thank you!
https://gist.github.com/anupambhatnagar/ad76524da1ca783f18ec08ad5805ac06