Make sure that all kernel launch functions try to validate - in Debug mode - their launch configurations, in themselves and w.r.t. the relevant device and/or kernel.
We're not currently doing any validation w.r.t. block cooperation support and CUDA-12 launch attributes. Let's start doing that.
Let's: