ROCm / roctracer

ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
https://rocm.docs.amd.com/projects/roctracer/en/latest/
Other
69 stars 30 forks source link

[Issue]: Roctracer GPU Events Have Overlapping Intervals #104

Open sraikund16 opened 2 weeks ago

sraikund16 commented 2 weeks ago

Problem Description

When running a very small Resnet50 model, I am seeing that GPU events on a single track (stream/queue) have events with overlapping time intervals. I see these issues commonly in very specific kernels such as MIOpenBatchNormBwdSpatial and batched_transpose_32x32_dword which have kind=0x11F0 and op=0. To investigate further, I created a debug branch here to see what the output of roctracer (before kineto does any processing) was returning: https://github.com/pytorch/kineto/pull/990/files

In this branch I have a debug that triggers several messages similar to the following: Out of order activity: 1886121463888334 < 1886121463888361. Difference: 27 ns. Kernel: batched_transpose_32x32_dword last Kernel: MIOpenBatchNormFwdTrainSpatialNorml which suggests that there is interval overlapping. In this branch I am only check for overlapping events for non-unknown kind events but there are also many overlappings there as well.

Thanks!

Operating System

CentOS Stream 9

CPU

AMD EPYC 7713

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

roctracer

Steps to Reproduce

Run model with the kernels specified above and observe if they overlap or not

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

sraikund16 commented 2 weeks ago

Since the overlap is so small I am thinking that there could be possibly some rounding issue that is going on?

sraikund16 commented 2 weeks ago

Here is another print with the queue ids outputted: Out of order activity: 1895910188521077 < 1895910188521125. Difference: 48 ns. Kernel: batched_transpose_16x32_dword last Kernel: batched_transpose_16x32_dword Queue: 0 last Queue: 0