LLNL / thicket

https://thicket.readthedocs.io
MIT License
14 stars 9 forks source link

NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

Open michaelmckinsey1 opened 2 months ago

michaelmckinsey1 commented 2 months ago

tldr:

Description

Enables support for reading NCU report profiles for RAJA_CUDA and Lambda_CUDA variants and cub kernels by using the demangled action name.

The current Thicket NCU reader matches nodes in a Caliper cuda_activity_profile (CAP) and NCU report file by checking if an action in the report has the name action.name(_ncu_report.IAction_NameBase_FUNCTION), which for Base_CUDA is the name of the kernel (e.g. daxpy, energy1, or energy2). This name can be found in the CAP node name kernel_name in node.frame["name"].

For RAJA_CUDA and Lambda_CUDA, the above assumption does not hold, as the values for action.name(_ncu_report.IAction_NameBase_FUNCTION) will not be the kernel names. However, the kernel names are still embedded in the action.name(_ncu_report.IAction_NameBase_DEMANGLED) demangled action name. This PR parses the demangled name to match the nodes in the CAP, which also works for Base_CUDA profiles.

For cub kernels, there may be kernels with the same name, but different function signatures. For example, matching the ncu kernel void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, double, int>::Policy700, false, false, double, double, int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>) to the first DeviceRadixSortDownsweepKernel in the following calltree:

nan RAJAPerf
└─ nan Algorithm
   ├─ nan Algorithm_SORT
   │  ├─ nan cudaLaunchKernel
   │  │  ├─ 1016096.000 void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, cub::NullType, int>::Policy700, **false, false**, double, cub::NullType, int>(double const*, double*, cub::NullType const*, cub::NullType*, int*, int, int, int, cub::GridEvenShare<int>)
   │  │  ├─ 1399520.000 void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, cub::NullType, int>::Policy700, **true, false**, double, cub::NullType, int>(double const*, double*, cub::NullType const*, cub::NullType*, int*, int, int, int, cub::GridEvenShare<int>)

We use similarity matching using the standard library difflib SequenceMatcher to match the two, after first narrowing the search down to the Algorithm_SORT part of the calltree.

NCU kernel support by variant:

This PR (#201)

Base_CUDA Lambda_CUDA RAJA_CUDA
rajaperf kernels
cub kernels
kernels with multiple instances

Develop

Base_CUDA Lambda_CUDA RAJA_CUDA
rajaperf kernels x x
cub kernels x x x
kernels with multiple instances x x
ilumsden commented 1 month ago

Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.

michaelmckinsey1 commented 1 month ago

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

michaelmckinsey1 commented 1 day ago

Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.

Addressed in new changes.