Add a new flag -minimal for NVRTC compilation. The -minimal flag omits certain language features to reduce compile time for small programs. In particular, the following are omitted:
Texture and surface functions and associated types (for example, cudaTextureObject_t).
CUDA Runtime Functions that are provided by the cudadevrt device code library, typically named with prefix “cuda”, for example, cudaMalloc.
Kernel launch from device code.
Types and macros associated with CUDA Runtime and Driver APIs, provided by cuda/tools/cudart/driver_types.h, typically named with the prefix “cuda” for example, cudaError_t.
This might be worth investigating in the future (post #1150)
This will require changes to our headers to prevent nvrtc from seeing cudaStream_t etc. Actually adding it to JitifyCache::compileKernel is trivial (though potentially it can be made a runtime decision rather than compile time for cuda 12.0-12.3 builds, depending on how nvrtc works)
---------------------------------------------------
--- JIT compile log for outputdata_program ---
---------------------------------------------------
flamegpu/simulation/detail/CUDAScanCompaction.h(65): error: identifier "cudaStream_t" is undefined
void zero_scan_flag_async(cudaStream_t stream);
^
flamegpu/simulation/detail/CUDAScanCompaction.h(115): error: identifier "cudaStream_t" is undefined
void zero_async(const Type& type, cudaStream_t stream, unsigned int streamId);
^
flamegpu/exception/FLAMEGPUDeviceException.cuh(26): error: identifier "cudaStream_t" is undefined
DeviceExceptionBuffer *getDevicePtr(unsigned int streamId, cudaStream_t stream);
^
flamegpu/exception/FLAMEGPUDeviceException.cuh(27): error: identifier "cudaStream_t" is undefined
void checkError(const std::string &function, unsigned int streamId, cudaStream_t stream);
^
4 errors detected in the compilation of "outputdata_program".
CUDA 12.4 introduces:
This might be worth investigating in the future (post #1150)
This will require changes to our headers to prevent nvrtc from seeing
cudaStream_t
etc. Actually adding it toJitifyCache::compileKernel
is trivial (though potentially it can be made a runtime decision rather than compile time for cuda 12.0-12.3 builds, depending on how nvrtc works)