QuantumBFS / CuYao.jl

CUDA extension for Yao.jl
https://yaoquantum.org
Other
35 stars 8 forks source link

Merge kernels #3

Closed GiggleLiu closed 9 months ago

GiggleLiu commented 5 years ago

Now the launch overhead is more than 99%

➜  modules git:(GPUdemo) ✗ nvprof julia QCBMS.jl
==22279== NVPROF is profiling process 22279, command: julia QCBMS.jl
==22279== Profiling application: julia QCBMS.jl
==22279== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   70.36%  77.0104s    810000  95.074us  74.113us  279.72us  ptxcall_simple_kernel_2
                   28.96%  31.6927s    720000  44.017us  32.896us  113.19us  ptxcall_simple_kernel_3
                    0.68%  748.96ms     10000  74.895us  72.801us  79.361us  ptxcall_anonymous23_1
                    0.00%  1.1371ms         4  284.27us  1.7600us  1.0389ms  [CUDA memcpy HtoD]
      API calls:   99.11%  90.5692s   1540000  58.811us  6.5610us  9.6723ms  cuLaunchKernel
                    0.43%  389.37ms   1540034     252ns     145ns  649.65us  cuCtxGetCurrent
                    0.23%  210.94ms         1  210.94ms  210.94ms  210.94ms  cuCtxCreate
                    0.14%  129.13ms         1  129.13ms  129.13ms  129.13ms  cuCtxDestroy
                    0.07%  65.987ms         3  21.996ms  47.171us  65.891ms  cuModuleUnload
                    0.01%  13.700ms        27  507.41us  439.26us  724.08us  cuMemAlloc
                    0.00%  2.5056ms         3  835.19us  348.68us  1.7719ms  cuModuleLoadDataEx
                    0.00%  1.4557ms         4  363.94us  43.000us  1.1706ms  cuMemcpyHtoD
                    0.00%  36.489us         8  4.5610us  3.6320us  8.1710us  cuDeviceGetPCIBusId
                    0.00%  15.972us        30     532ns     167ns  2.4170us  cuDeviceGetAttribute
                    0.00%  9.0610us         9  1.0060us     283ns  4.6000us  cuDeviceGet
                    0.00%  3.2120us         3  1.0700us  1.0430us  1.0890us  cuModuleGetFunction
                    0.00%  2.6260us         3     875ns     707ns  1.0060us  cuCtxGetDevice
                    0.00%  2.4400us         1  2.4400us  2.4400us  2.4400us  cuDriverGetVersion
                    0.00%  2.0020us         2  1.0010us     282ns  1.7200us  cuDeviceGetCount
GiggleLiu commented 9 months ago

does not make sense.