NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.25k stars 159 forks source link

[FEA]: Optionally disable code generation facilities in STF #2711

Open caugonnet opened 19 hours ago

caugonnet commented 19 hours ago

Is this a duplicate?

Area

CUDA Experimental (cudax)

Is your feature request related to a problem? Please describe.

The STF model provides methods to generate CUDA kernels (parallel_for and launch) in addition to orchestrating asynchronous computation.

This is not needed for applications which provide their own kernels or rely on libraries, so removing these feature might speed up compilation significantly. More importantly, these code generation feature require specific flags for CUDA to enable extended lambda functions, and device constexpr functions (--expt-relaxed-constexpr --extended-lambda) which is prohibited in some cases.

Describe the solution you'd like

We should therefore be able to disable parallel_for and launch when including STF. This is already disabled automatically for non CUDA compilers, but we may still want to disable it for nvcc.

Describe alternatives you've considered

No response

Additional context

No response

caugonnet commented 19 hours ago

I suppose the best option is to define a cudax_ENABLE_CUDASTF_CODE_GENERATION flag. In our presets, that would be set to true by default on nvcc in, and false otherwise.

Applications may set that flag directly in their cmake config, or define -DNO_CUDASTF_CODE_GENERATION manually when using make