Open j-stephan opened 3 years ago
We also faced it with PIConGPU and arguably worse, as most kernels there have a few template parameters that are themselves template classes. To be honest, I am not sure if we can do much on the alpaka side to help it. Maybe with having concepts some of our template parameters become that, I have no idea how the internal function names would look then.
The reason for the long kernel names @kloppstock got is that he is encoding many information into the name, thats the same we do with PIConGPU. Not sure how @SimeonEhrig was able to generate 512k char's for a kernel name. In PIConGPU we have typical kernel names of up to 10KiB.
IMO if you not switch to an interface like SYCL has where you not add templates to the device, ... you can not shorten the names much.
@psychocoderHPC You mixed up something. In my case I have relative short names. @kloppstock has the really long names. But they are long enough, to get problems with nsight systems on 1440p displays.
I'm not sure, if it make sense to short the alpaka names. A normal kernel execution has simply a lot of parameters and template types.
I think about to support profiling libraries for the different backends. For example the nvtx librarie for Nvidia. In Kokkos, you can pass a name to a kernel call for profiling and debugging purpose: https://github.com/kokkos/kokkos/wiki/Kokkos%3A%3Aparallel_for Maybe it is also useful for bactria.
Is this better now with #1795? What remains to be fixed?
I think they are still very long and difficult to use with the likes of the NVIDIA Nsight tools - but I do not have a better suggestion...
The ideal implementation would be something lets the user somehow pick the kernel name, but that would likely require some invasive macros in the kernel launch code.
As pointed out by @SimeonEhrig (who is currently resorting to profiling on a widescreen 4K TV) alpaka's generated kernel names are way too long (up to 512 KiB as shown by @kloppstock). We should think about a way to considerably shorten them.
Example from the test cases when compiled with the SYCL backend: