Open michaelmckinsey1 opened 2 months ago
Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.
This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.
Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.
Addressed in new changes.
tldr:
Lambda_CUDA
andRAJA_CUDA
variants by using demangled kernel names.debug
flag to see detailed information about kernel matches, so we can preliminarily investigate future issues without editing the source code.Description
Enables support for reading NCU report profiles for
RAJA_CUDA
andLambda_CUDA
variants andcub
kernels by using the demangled action name.The current Thicket NCU reader matches nodes in a Caliper
cuda_activity_profile
(CAP) and NCU report file by checking if an action in the report has the nameaction.name(_ncu_report.IAction_NameBase_FUNCTION)
, which forBase_CUDA
is the name of the kernel (e.g.daxpy
,energy1
, orenergy2
). This name can be found in the CAP node namekernel_name in node.frame["name"]
.For
RAJA_CUDA
andLambda_CUDA
, the above assumption does not hold, as the values foraction.name(_ncu_report.IAction_NameBase_FUNCTION)
will not be the kernel names. However, the kernel names are still embedded in theaction.name(_ncu_report.IAction_NameBase_DEMANGLED)
demangled action name. This PR parses the demangled name to match the nodes in the CAP, which also works forBase_CUDA
profiles.For cub kernels, there may be kernels with the same name, but different function signatures. For example, matching the ncu kernel
void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, double, int>::Policy700, false, false, double, double, int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>)
to the firstDeviceRadixSortDownsweepKernel
in the following calltree:We use similarity matching using the standard library difflib
SequenceMatcher
to match the two, after first narrowing the search down to theAlgorithm_SORT
part of the calltree.NCU kernel support by variant:
This PR (#201)
Develop