I am wondering how to capture the kernel execution end in the callbacks. There's a callback site for SANITIZER_CBID_LAUNCH_END, but this corresponds to the kernel launch end, and the kernel has not finished yet at that time. I noticed that samples such as MemoryTracker and DeviceMalloc process the data at the synchronization callback, but this does not seem to work if I want to collect data on a kernel-wise basis while there is no synchronization between kernels (which I assume is common in large applications).
I have two tentative solutions:
Use the sanitizerStreamSynchronize function in the SANITIZER_CBID_LAUNCH_END callback and process the data afterward (the kernel should finish execution by then)
Count the remaining threads by subtracting threads in a SANITIZER_INSTRUCTION_BLOCK_EXIT patch function.
However, I am not sure why neither of these works, I am wondering if there's any better solution to capture the actual kernel execution end. Thanks for any feedback!
Hi,
I am wondering how to capture the kernel execution end in the callbacks. There's a callback site for
SANITIZER_CBID_LAUNCH_END
, but this corresponds to the kernel launch end, and the kernel has not finished yet at that time. I noticed that samples such asMemoryTracker
andDeviceMalloc
process the data at the synchronization callback, but this does not seem to work if I want to collect data on a kernel-wise basis while there is no synchronization between kernels (which I assume is common in large applications).I have two tentative solutions:
sanitizerStreamSynchronize
function in theSANITIZER_CBID_LAUNCH_END
callback and process the data afterward (the kernel should finish execution by then)SANITIZER_INSTRUCTION_BLOCK_EXIT
patch function.However, I am not sure why neither of these works, I am wondering if there's any better solution to capture the actual kernel execution end. Thanks for any feedback!