Open caugonnet opened 19 hours ago
I suppose the best option is to define a cudax_ENABLE_CUDASTF_CODE_GENERATION flag. In our presets, that would be set to true by default on nvcc in, and false otherwise.
Applications may set that flag directly in their cmake config, or define -DNO_CUDASTF_CODE_GENERATION manually when using make
Is this a duplicate?
Area
CUDA Experimental (cudax)
Is your feature request related to a problem? Please describe.
The STF model provides methods to generate CUDA kernels (parallel_for and launch) in addition to orchestrating asynchronous computation.
This is not needed for applications which provide their own kernels or rely on libraries, so removing these feature might speed up compilation significantly. More importantly, these code generation feature require specific flags for CUDA to enable extended lambda functions, and device constexpr functions (
--expt-relaxed-constexpr --extended-lambda
) which is prohibited in some cases.Describe the solution you'd like
We should therefore be able to disable parallel_for and launch when including STF. This is already disabled automatically for non CUDA compilers, but we may still want to disable it for nvcc.
Describe alternatives you've considered
No response
Additional context
No response