AMDResearch / hpcfund

AMD HPC Research Fund Cloud
https://amdresearch.github.io/hpcfund/
11 stars 4 forks source link

Profiling the node(s) #14

Open arkhadem opened 9 months ago

arkhadem commented 9 months ago

Hi,

I need to profile the microarchitecture for some HPC applications. I aim to profile microarchitectural events such as cache hit/miss rate. Based on my understanding, I should use the AMD uProf profiler. Would you please let me know if we have access to this profiler in the HPC cloud or not, and if yes, how I can access it?

Thank you in advance

tom-papatheodore commented 9 months ago

Hey Alireza-

I don't believe we currently have uProf on the cluster, but we can potentially install it so it's available as an environment module. Is it the CPUs or GPUs you're looking to profile? If the latter, you can use either rocprof or omniperf. Let me know what you need and we can go from there.

-Tom

arkhadem commented 9 months ago

Hi Tom,

Thanks for getting back to me.

I have some HPC applications implemented with MPI and OMP on the CPU and HIP on the GPU. Honestly, I am new to the AMD world, and I do not have any experience in profiling AMD hardware. I am looking for a profiler like Intel Vtune for AMD CPU and Nvidia Nsight Compute Profiler for AMD GPU. I am looking for the MPI overhead, program hotspots (top-down analysis), detailed performance counters like cache hit rate and branch predictor miss rate (and MPKI), memory bandwidth and latency, utilization, etc. Based on my brief research, I found AMD uProf for CPU and AMD Radeon GPU profiler.

Hence, I appreciate any insights on the profiler, as well as installing them on the HPC servers as a module.

Thank you very much for your time and consideration.

arkhadem commented 8 months ago

Hi Tom,

Do you have any updates on this issue? My research is blocked by the need for the profilers. I would appreciate it if you install the tools as a module and let me know how I should access them.

Sincerely,

tom-papatheodore commented 8 months ago

Hey Alireza-

omniperf, omnitrace, and rocprof are the AMD counterparts to NVIDIA's NSightCompute, NSightSystems, and nvprof, respectively. omniperf is currently available on the cluster as an environment module, and rocprof is installed as part of ROCm, so you should be able to get started with these tools now.

@koomie Can we install omnitrace and uprof on the cluster?

Here are the relevant docs to help you get started, Alireza:

-Tom

koomie commented 8 months ago

FYI, omniperf uses rocprof under the covers to access a variety of hardware counters (it will run your application multiple times to be able to gather a range of counters on a per-gpu kernel basis). I suspect this is probably the tool you want to start with.

arkhadem commented 7 months ago

Hi @tom-papatheodore and @koomie, I found the rocprof under the rocm module and I think that would be enough for GPU. Thanks for sending the links, they are comprehensive and useful.

But for CPU profiling, I think I need the uProf still. Would you let me know what is the status of the uProf installation?

arkhadem commented 7 months ago

Hi @tom-papatheodore and @koomie,

Do you have any updates on this?

Best,

koomie commented 3 months ago

Yes, and apologies for the delay. We have installed uProf across the system. There is no module for it yet, but you can access the binaries directly at: /opt/AMDuProf_4.2-850/bin/

As Tom mentioned, Omniperf is a good tool for detailed single-node GPU analysis with hardware counters, and you can access via the pre-installed modules on the system (e.g. module load omniperf).