Summary

This PR converts non-RAJA base and lambda GPU kernel variants so that all GPU variants use the same methodology for kernel launches, specifically what is used inside RAJA.
It introduces new methods to launch non-RAJA variants of GPU kernels and converts all such kernel implementations to use them.
It also addresses function argument organization and alignment issues that have been discussed by the team.

NOTE: This is a large PR touching many files. No functionality was changed. All of the changes are very similar acoss all of the kernels.

LLNL / RAJAPerf