NVIDIA / compute-sanitizer-samples

Samples demonstrating how to use the Compute Sanitizer Tools and Public API
BSD 3-Clause "New" or "Revised" License
70 stars 12 forks source link

How to capture the actual kernel execution end #23

Open Lin-Mao opened 3 hours ago

Lin-Mao commented 3 hours ago

Hi,

I am wondering how to capture the kernel execution end in the callbacks. There's a callback site for SANITIZER_CBID_LAUNCH_END, but this corresponds to the kernel launch end, and the kernel has not finished yet at that time. I noticed that samples such as MemoryTracker and DeviceMalloc process the data at the synchronization callback, but this does not seem to work if I want to collect data on a kernel-wise basis while there is no synchronization between kernels (which I assume is common in large applications).

I have two tentative solutions:

  1. Use the sanitizerStreamSynchronize function in the SANITIZER_CBID_LAUNCH_END callback and process the data afterward (the kernel should finish execution by then)
  2. Count the remaining threads by subtracting threads in a SANITIZER_INSTRUCTION_BLOCK_EXIT patch function.

However, I am not sure why neither of these works, I am wondering if there's any better solution to capture the actual kernel execution end. Thanks for any feedback!

Lin-Mao commented 3 hours ago

@achartier @aladram