Add to helpful queries.
Join kernelApi and op; add queue depth (number of kernels already enqueued when a new one is enqueued)
One .cmd to show launch delay based on kernel (when there is no queue depth)
One .cmd to show time lost in inter-kernel switching (when there is queue depth)
Add to helpful queries. Join kernelApi and op; add queue depth (number of kernels already enqueued when a new one is enqueued) One .cmd to show launch delay based on kernel (when there is no queue depth) One .cmd to show time lost in inter-kernel switching (when there is queue depth)