Closed Thyre closed 2 months ago
I can confirm that the issue is fixed with LLVM 19git and will therefore eventually also land in ROCm.
As the limitation seems to come from tying to call library functions during _dl_start_user
, this limitation should probably be documented somewhere if not done already.
CUDA for example includes this paragraph in their documentation:
The CUDA interfaces use global state that is initialized during host program initiation and destroyed during host program termination. The CUDA runtime and driver cannot detect if this state is invalid, so using any of these interfaces (implicitly or explicitly) during program initiation (or termination after main) will result in undefined behavior.
I'm closing the issue.
While testing how our HIP adapter in Score-P interacts with OpenMP target regions, I've encountered the following issue preventing me from testing it.
In Score-P, adapters are divided into several subsystems. Upon startup, one subsystem might initialize all others. In the case of OMPT, the subsystem will probably be the first one to initialize all other ones during
ompt_start_tool
. Its exactly here where we run into an issue.Looking at the following source code, we can see whats happening:
Most of the code it just here to build a valid OMPT interface. When running the code,
ompt_start_tool
gets called which tries to initializerocm-smi
viarsmi_init
. However, because we're still inside ofompt_start_tool
, the initialization fails.The question is: Is this intended? I also observed that other
hip
related functions likehipGetDeviceCount
fail with a segmentation fault which lead me to believe that all ROCm related stuff is just not initialized and ready to use during theompt_start_tool
call.