arcs-skku / EMDC_llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
38 stars 6 forks source link

Question regarding Nsight profiler result #20

Open an-ys opened 1 year ago

an-ys commented 1 year ago

According to "benchmarks/cuda/src/cuda.nvprof", cudaFree takes up most of the execution time. Is this because the time includes the cudaMemsetAsync calls after the initial cudaMemset call? Is there a way to estimate the execution time for the asynchronous calls and cudaFree call?