ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
116 stars 44 forks source link

rocprofiler segmentation fault with aomp compiled code #81

Closed fluidnumerics-joe closed 2 years ago

fluidnumerics-joe commented 2 years ago

rocprof (with ROCm 4.5.0) throws a segmentation fault error when using the --stats, --hsa-trace, or --sys-trace flags on binaries created by amdclang and amdflang with OpenMP GPU offloading enabled. This prevents us from creating hotspot and trace profiles for applications via rocprof. This error was not seen in earlier versions of ROCm (tested the same code under ROCm 4.3.0 without issue).

This is similar to https://github.com/ROCm-Developer-Tools/rocprofiler/issues/49, but we find quite a few more symbols missing in the llvm libraries than what is reported in https://github.com/ROCm-Developer-Tools/rocprofiler/issues/49.

When running a fairly simple application under rocprof with LD_DEBUG=libs, I see the following errors when using ROCm 4.5.0 :

$ LD_DEBUG=libs rocprof --stats ./smoother 1000 1000 100 2&>1 > log.txt
$ grep error log.txt 
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomp.so: error: symbol lookup error: undefined symbol: ompt_start_tool (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_init_requires (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_data_submit_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_data_retrieve_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_run_target_region_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_run_target_team_region_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_synchronize (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_data_exchange (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_data_exchange_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_is_data_exchangable (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_register_lib (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_unregister_lib (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.x86_64.so: error: symbol lookup error: undefined symbol: __tgt_rtl_supports_empty_images (fatal)
     18787: /opt/rocm-4.5.0/rocprofiler/tool/libtool.so: error: symbol lookup error: undefined symbol: OnLoadTool (fatal)
     18787: /opt/rocm-4.5.0/rocprofiler/lib/librocprofiler64.so: error: symbol lookup error: undefined symbol: WrapAgent (fatal)
     18787: /opt/rocm-4.5.0/rocprofiler/lib/librocprofiler64.so: error: symbol lookup error: undefined symbol: AddAgent (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_run_target_team_region_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_data_exchange (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_data_exchange_async (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_is_data_exchangable (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_register_lib (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_unregister_lib (fatal)
     18787: /opt/rocm-4.5.0/llvm/bin/../lib/libomptarget.rtl.amdgpu.so: error: symbol lookup error: undefined symbol: __tgt_rtl_supports_empty_images (fatal)

I'm curious to know if there is automated testing done on rocprofiler to help catch these kinds of issues before tagging and releasing new versions. I find that from version to version, various components of ROCm break, which makes the upgrade process a bit unpredictable from an integrator's and user's perspective.

kikimych commented 2 years ago

OpenMP profiling is not supported at this moment

fluidnumerics-joe commented 2 years ago

@kikimych - This is quite unfortunate. ROCm 4.3.x was the last release series that I noticed was able to create hotspot and trace profiles of OpenMP accelerated applications. Since AOMP with OpenMP 5.0 support is part of ROCm, the perception was that OpenMP profiling would be supported in the ROCm ecosystem. What would it take to have OpenMP profiling supported ?

fluidnumerics-joe commented 2 years ago

I can confirm that this issue is resolved with ROCm 5.0.0 and greater. To create trace profiles for OpenMP accelerated code, the --hsa-trace flag creates a chrome trace compatible json trace file.

rocprof --hsa-trace --stats ./smoother 1000 1000 100