ROCm / omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis
https://rocm.docs.amd.com/projects/omnitrace/en/latest/
MIT License
297 stars 27 forks source link

Problem with flow event #266

Closed Luke20000429 closed 1 year ago

Luke20000429 commented 1 year ago

I write a program which will launch kernels on 4 independent streams. However, when I profile that with omnitrace the flow event of memcpyD2H looks like this image image I thought these memcpy should be independent from each other, could someone explain why this happens? I am running on ROCM 5.4.3 with omnitrace 1.8.0 and gfx1030.

jrmadsen commented 1 year ago

This looks like a bug in the correlation IDs provided by the roctracer library. My guess is that roctracer is giving OmniTrace the same correlation ID everytime so Perfetto is connecting all of them.

Can you click on one of the entries with arrows and, in the debug annotations in the lower right of the details at the bottom, there should be a "corr_id" arg with a value -- if you right-click and there should be an option for something like "Find all arg with same value"? My guess is that will return a lot of results.

If possible, could you verify this behavior doesn't exist in ROCm 5.3? If not and the code producing this is relatively simple, please just fork and drop it into a subdirectory in the examples folder and LMK so I can try to reproduce it.

Luke20000429 commented 1 year ago

Thanks for your quick response, I checked the corr_id of several entries but it seems that each corr_id is only shared by two entries. For example, one of the CopyDeviceToHost has corr_id=295. The same id is only used by the API call. image image

I didn't install ROCm 5.3, so I might switch to another workstation. If that doesn't work, I will add a simple demo to my forked repo.

jrmadsen commented 1 year ago

I didn't install ROCm 5.3, so I might switch to another workstation. If that doesn't work, I will add a simple demo to my forked repo.

Ah no need, I located the problem. I was accidentally using the internal correlation id for critical tracing (which intentionally makes the connections you are seeing) instead of using the roctracer correlation id. I didn't realize that was the case bc I did a very poor job naming the variables: the former variable name was _cid and the latter variable name was _corr_id. I'll get that fixed and generate a v1.9.1 release.

Luke20000429 commented 1 year ago

Sounds great!