Open arkhadem opened 9 months ago
Hey Alireza-
I don't believe we currently have uProf on the cluster, but we can potentially install it so it's available as an environment module. Is it the CPUs or GPUs you're looking to profile? If the latter, you can use either rocprof
or omniperf
. Let me know what you need and we can go from there.
-Tom
Hi Tom,
Thanks for getting back to me.
I have some HPC applications implemented with MPI and OMP on the CPU and HIP on the GPU. Honestly, I am new to the AMD world, and I do not have any experience in profiling AMD hardware. I am looking for a profiler like Intel Vtune for AMD CPU and Nvidia Nsight Compute Profiler for AMD GPU. I am looking for the MPI overhead, program hotspots (top-down analysis), detailed performance counters like cache hit rate and branch predictor miss rate (and MPKI), memory bandwidth and latency, utilization, etc. Based on my brief research, I found AMD uProf for CPU and AMD Radeon GPU profiler.
Hence, I appreciate any insights on the profiler, as well as installing them on the HPC servers as a module.
Thank you very much for your time and consideration.
Hi Tom,
Do you have any updates on this issue? My research is blocked by the need for the profilers. I would appreciate it if you install the tools as a module and let me know how I should access them.
Sincerely,
Hey Alireza-
omniperf
, omnitrace
, and rocprof
are the AMD counterparts to NVIDIA's NSightCompute, NSightSystems, and nvprof
, respectively. omniperf
is currently available on the cluster as an environment module, and rocprof
is installed as part of ROCm, so you should be able to get started with these tools now.
@koomie Can we install omnitrace
and uprof
on the cluster?
Here are the relevant docs to help you get started, Alireza:
-Tom
FYI, omniperf
uses rocprof
under the covers to access a variety of hardware counters (it will run your application multiple times to be able to gather a range of counters on a per-gpu kernel basis). I suspect this is probably the tool you want to start with.
Hi @tom-papatheodore and @koomie, I found the rocprof under the rocm module and I think that would be enough for GPU. Thanks for sending the links, they are comprehensive and useful.
But for CPU profiling, I think I need the uProf still. Would you let me know what is the status of the uProf installation?
Hi @tom-papatheodore and @koomie,
Do you have any updates on this?
Best,
Yes, and apologies for the delay. We have installed uProf across the system. There is no module for it yet, but you can access the binaries directly at: /opt/AMDuProf_4.2-850/bin/
As Tom mentioned, Omniperf is a good tool for detailed single-node GPU analysis with hardware counters, and you can access via the pre-installed modules on the system (e.g. module load omniperf
).
Hi,
I need to profile the microarchitecture for some HPC applications. I aim to profile microarchitectural events such as cache hit/miss rate. Based on my understanding, I should use the AMD uProf profiler. Would you please let me know if we have access to this profiler in the HPC cloud or not, and if yes, how I can access it?
Thank you in advance