eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
768 stars 79 forks source link

Full representation of launch configurations + launch-with-full-config support #564

Closed eyalroz closed 6 months ago

eyalroz commented 8 months ago

Since CUDA 12, the driver finally supports a proper launch configuration object, with a bunch of flags and features:

CUresult cuLaunchKernelEx (const CUlaunchConfig* config, CUfunction f, void** kernelParams, void** extra )

with the launch config being:

typedef struct CUlaunchConfig_st {
    CUlaunchAttribute * attrs
    unsigned int  blockDimX
    unsigned int  blockDimY
    unsigned int  blockDimZ
    unsigned int  gridDimX
    unsigned int  gridDimY
    unsigned int  gridDimZ
    CUstream hStream
    unsigned int  numAttrs
    unsigned int  sharedMemBytes 
} CUlaunchConfig;

Each attribute has an ID and a value in a union, and here is the current list of IDs:

CU_LAUNCH_ATTRIBUTE_IGNORE
CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW
CU_LAUNCH_ATTRIBUTE_COOPERATIVE
CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY
CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION
CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE
CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION
CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT
CU_LAUNCH_ATTRIBUTE_PRIORITY
CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP
CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN
CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT

some of these regard launch-related/scheduling-related events (which should be another missing-feature issue).