ROCm / omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis
https://rocm.docs.amd.com/projects/omnitrace/en/latest/
MIT License
297 stars 27 forks source link

OpenMP offloading #280

Open ooreilly opened 1 year ago

ooreilly commented 1 year ago

I'm trying omnitrace with OpenMP offloading for a small fortran test code. Depending on which system I tested on I encountered different issues. The test code is compiled using the HPE Cray compiler, CCE 15.0.1.

I either saw:

WARNING: Unrecognized OMPT entry_point request ompt_get_record_type
WARNING: Unrecognized OMPT entry_point request ompt_get_record_ompt
WARNING: Unrecognized OMPT entry_point request ompt_get_device_num_procs
WARNING: Unrecognized OMPT entry_point request ompt_callback_mutex
WARNING: Unrecognized OMPT entry_point request ompt_callback_nest_lock
WARNING: Unrecognized OMPT entry_point request ompt_callback_flush
WARNING: Unrecognized OMPT entry_point request ompt_callback_cancel
WARNING: Unrecognized OMPT entry_point request ompt_callback_dispatch
WARNING: Unrecognized OMPT entry_point request ompt_callback_buffer_request
WARNING: Unrecognized OMPT entry_point request ompt_callback_buffer_complete
WARNING: Unrecognized OMPT entry_point request ompt_callback_dependences
WARNING: Unrecognized OMPT entry_point request ompt_callback_task_dependence
[omnitrace][21794][2045] No signals to block...
[omnitrace][21794][2044] No signals to block...
[omnitrace][21794][OnLoad] Loading ROCm tooling...
[omnitrace][21794][0][OnLoad] Setting rocm_smi state to active...
[omnitrace][21794][0][OnLoad] Requesting roctracer to setup...
[omnitrace][21794][PID=21794][rank=0] Thread 1 [0x000000000000552b] (#5) (parent: 0 [0x0000000000005522] (#0)) created
[omnitrace][21794][PID=21794][rank=0] Thread 1 [0x000000000000552b] (#5) (parent: 0 [0x0000000000005522] (#0)) exited
 n =  1100000000
 Data size (read and write): 17.600000000000001 GB
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error! nullptr to ompt_data_t! key = ompt_target_enter_data_dev_0

or:

OMNITRACE: HSA_TOOLS_LIB=/pfs/lustrep2/projappl/project_462000125/omnitrace/lib/libomnitrace-dl.so.1.10.0
OMNITRACE: HSA_TOOLS_REPORT_LOAD_FAILURE=1
OMNITRACE: LD_PRELOAD=/pfs/lustrep2/projappl/project_462000125/omnitrace/lib/libomnitrace-dl.so.1.10.0
OMNITRACE: OMP_TOOL_LIBRARIES=/pfs/lustrep2/projappl/project_462000125/omnitrace/lib/libomnitrace-dl.so.1.10.0
OMNITRACE: ROCP_HSA_INTERCEPT=1
OMNITRACE: ROCP_TOOL_LIB=/pfs/lustrep2/projappl/project_462000125/omnitrace/lib/libomnitrace.so.1.10.0
srun: error: nid007263: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=3480167.3

Any idea what is happening here? Thanks!

ppanchad-amd commented 1 week ago

Hi @ooreilly. Internal ticket has been created to investigate your issue. Thanks!

darren-amd commented 5 days ago

Hi @ooreilly,

I tried running a simple Fortran example with OpenMP offloading and was unable to reproduce the error on omnitrace-instrument v1.11.2, ROCm 6.2.2, and the GNU Fortran compiler. Could you please provide more information so that I may further investigate:

  1. The Fortran example you are running
  2. The OS, GPU and ROCm version of the 2 systems
  3. Omnitrace version omnitrace-instrument --version
  4. Commands you are using to compile the test code and run omnitrace

Also, I wanted to confirm if the compiled executable runs as expected without omnitrace? Having this information should allow me to help further, thanks!