[FEA]: Optionally disable code generation facilities in STF

Is this a duplicate?

[ ] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

CUDA Experimental (cudax)

Is your feature request related to a problem? Please describe.

The STF model provides methods to generate CUDA kernels (parallel_for and launch) in addition to orchestrating asynchronous computation.

This is not needed for applications which provide their own kernels or rely on libraries, so removing these feature might speed up compilation significantly. More importantly, these code generation feature require specific flags for CUDA to enable extended lambda functions, and device constexpr functions (--expt-relaxed-constexpr --extended-lambda) which is prohibited in some cases.

Describe the solution you'd like

We should therefore be able to disable parallel_for and launch when including STF. This is already disabled automatically for non CUDA compilers, but we may still want to disable it for nvcc.

Describe alternatives you've considered

No response

Additional context

No response

NVIDIA / cccl