ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

How to solve the error(4103) when profling LLM training with MI250? #119

Open lingjiew93 opened 11 months ago

lingjiew93 commented 11 months ago

I'm running LLM training with MI250. The instruction and code I used are https://www.mosaicml.com/blog/amd-mi250 and https://github.com/mosaicml/llm-foundry It runs well without profiling, but when I tried to profile below errors are showd. error(4103) "InterceptQueueCreate(), ProxyQueue::Create()" HSA_STATUS_ERROR_INVALID_QUEUE: The queue is invalid.