ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
119 stars 45 forks source link

How do host process id, host thread id, GPU id and GPU stream id mapped to pid and tid in chrome://tracing? #34

Open ipe-zhangyz opened 3 years ago

ipe-zhangyz commented 3 years ago

A heterogeneous computing application usually has computing units of muitiple level, for example, a host process controls a GPU, and may uses several streams on a GPU; or a host process spawns several host threads, each thread controls a GPU and may use several streams on a GPU. However, in chrome://tracing there seems to have only two levels: process and thread. In my experience, rocprof --hip-trace doesn't have a good solution of this problem. Another tool rpt, which is provided in hcc, seems to always map the "quene" number to tid, and the pid is alwayse 1, as the rpt has these code: def printJSON(self, file, timeOffset=0): tid = self.queue file.write('{ "pid":1, "tid":%d, "ts":%d, "dur":%d, "ph":"X", "name":"%s", "args":{"dev.queue.op":"%d.%d.%d", "stop":%d } }' %\ (tid, self.startTime/1000, (self.stopTime - self.startTime)/1000, self.name, \ self.device, self.queue, self.cmdNum, self.stopTime/1000) ) file.write(',\n') Maybe the self.device is really the GPU ID, but I don't know what the self.queue really is. Is it GPU stream id? But in rpt it is mapped to host thread id. This map sometimes makes the visualization in chrome://tracing in confusion.