According to "benchmarks/cuda/src/cuda.nvprof", cudaFree takes up most of the execution time. Is this because the time includes the cudaMemsetAsync calls after the initial cudaMemset call?
Is there a way to estimate the execution time for the asynchronous calls and cudaFree call?
According to "benchmarks/cuda/src/cuda.nvprof",
cudaFree
takes up most of the execution time. Is this because the time includes thecudaMemsetAsync
calls after the initialcudaMemset
call? Is there a way to estimate the execution time for the asynchronous calls andcudaFree
call?