The primary purpose of this PR is to remove the PCM kernel in favor of a device function called within the Riemann solver kernels. This builds off of PRs #371 and #375 and will show changes from both those PRs and this PR until those PRs are merged into dev. Most of the relevant changes in this PR are in the following files:
pcm_cuda.cu and .h
reconstruction.h
All Riemann solvers
There is some extra machinery in the Riemann solvers at the moment to deal with the fact that one reconstruction is fused and the rest aren't but that will go away once all reconstructions are fused.
The performance gain from fusing the PCM kernel into the Riemann solvers is ~7% in hydro builds and ~12.6% in MHD builds compared to the version of Cholla in PR #375 run_timing.log.
Other Changes
Fixed a ifndef that should have been an ifdef when warning that CUDA error checking was disabled
Added a HIP_KERNEL_NAME macro to CUDA builds. This macro is part of the HIP runtime but is not present in the CUDA runtime. It's used in kernel launches to wrap kernel names that have more than one template parameter since the comma in the template arguments plays havoc with some internals of the HIP runtime.
Remove reference to deprecated OUTPUT_ALWAYS build macro
Fixed a bug in AutomaticLaunchParams that would let it set a threads per block number higher than what __launch_bounds__() specified. Now it queries the kernel for that number and sets that as the maximum threads per block. This isn't causing any bugs at th moment but It did during some intermediate testing I was doing.
Summary
The primary purpose of this PR is to remove the PCM kernel in favor of a device function called within the Riemann solver kernels. This builds off of PRs #371 and #375 and will show changes from both those PRs and this PR until those PRs are merged into dev. Most of the relevant changes in this PR are in the following files:
There is some extra machinery in the Riemann solvers at the moment to deal with the fact that one reconstruction is fused and the rest aren't but that will go away once all reconstructions are fused.
The performance gain from fusing the PCM kernel into the Riemann solvers is ~7% in hydro builds and ~12.6% in MHD builds compared to the version of Cholla in PR #375 run_timing.log.
Other Changes
ifndef
that should have been anifdef
when warning that CUDA error checking was disabledHIP_KERNEL_NAME
macro to CUDA builds. This macro is part of the HIP runtime but is not present in the CUDA runtime. It's used in kernel launches to wrap kernel names that have more than one template parameter since the comma in the template arguments plays havoc with some internals of the HIP runtime.OUTPUT_ALWAYS
build macroAutomaticLaunchParams
that would let it set a threads per block number higher than what__launch_bounds__()
specified. Now it queries the kernel for that number and sets that as the maximum threads per block. This isn't causing any bugs at th moment but It did during some intermediate testing I was doing.