MFlowCode / MFC

Exascale simulation of multiphase/physics fluid dynamics
https://mflowcode.github.io
MIT License
132 stars 56 forks source link

Add debug info. for GPUs to docs. #431

Open sbryngelson opened 1 month ago

sbryngelson commented 1 month ago

Contributors do not currently know how to debug OpenACC code that they add. We should add documentation for this.

This includes environment variables, flags, etc.

For nvhpc + NVIDIA GPUs we have, to start:

export PGI_ACC_DEBUG=1
export NVCOMPILER_ACC_NOTIFY=3
export NVCOMPILER_TERM=trace

from here (https://docs.nvidia.com/hpc-sdk/compilers/openacc-gs/index.html#env-vars)

or

PGI_ACC_DEBUG
PGI_ACC_NOTIFY
PGI_ACC_TIME
PGI_ACC_PROFILE
PGI_ACC_FILL
PGI_ACC_SYNCHRONOUS

(see here https://indico.euro-fusion.org/event/460/attachments/520/1092/04_-_OpenACC_Debug.pdf)

etc. etc.

For AMD GPUs we have other considerations that @anandrdbz and @wilfonba know about, like

export CRAY_ACC_DEBUG=[1-3]

(more info. here https://www.olcf.ornl.gov/wp-content/uploads/2021/04/2021-05-20-Frontier-Tutorial-CCE.pdf)

From here https://cpe.ext.hpe.com/docs/cce/man7/intro_openacc.7.html :

CRAY_ACC_DEBUG Output Routines

When the runtime environment variable CRAY_ACC_DEBUG is set to 1, 2, or 3, CCE writes runtime commentary of accelerator activity to STDERR for debugging purposes; every accelerator action on every PE generates output prefixed with “ACC:”. This may produce a large volume of output and it may be difficult to associate messages with certain routines and/or certain PEs.

With this set of API calls, the programmer can enable or disable output at certain points in the code, and modify the string that is used as the debug message prefix.