QuantumBFS / CuYao.jl

CUDA extension for Yao.jl
https://yaoquantum.org
Other
35 stars 8 forks source link

Benchmark Results #1

Open GiggleLiu opened 5 years ago

GiggleLiu commented 5 years ago

9 qubit QCBM circuit with depth 8

Batched Performance

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  17.13 MiB
  allocs estimate:  15549
  --------------
  minimum time:     9.164 ms (0.00% GC)
  median time:      78.108 ms (2.70% GC)
  mean time:        76.510 ms (7.66% GC)
  maximum time:     105.105 ms (91.49% GC)
  --------------
  samples:          27
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
  allocs estimate:  2478
  --------------
  minimum time:     868.712 ms (0.00% GC)
  median time:      938.671 ms (0.00% GC)
  mean time:        926.054 ms (0.08% GC)
  maximum time:     970.780 ms (0.24% GC)
  --------------
  samples:          3
  evals/sample:     1

Single Run Performance

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.50 MiB
  allocs estimate:  15293
  --------------
  minimum time:     3.071 ms (0.00% GC)
  median time:      3.295 ms (0.00% GC)
  mean time:        3.750 ms (8.93% GC)
  maximum time:     10.285 ms (54.88% GC)
  --------------
  samples:          531
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     217.369 μs (0.00% GC)
  median time:      222.433 μs (0.00% GC)
  mean time:        292.978 μs (18.22% GC)
  maximum time:     8.223 ms (96.29% GC)
  --------------
  samples:          6768
  evals/sample:     1

Platform

CPU:
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz

GPU:
Nvidia GeForce 940MX
GiggleLiu commented 5 years ago

Another benchmark on Nvidia P100

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  16.70 MiB
  allocs estimate:  7278
  --------------
  minimum time:     4.623 ms (0.00% GC)
  median time:      10.226 ms (8.24% GC)
  mean time:        11.168 ms (9.86% GC)
  maximum time:     81.029 ms (89.50% GC)
  --------------
  samples:          180
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
  allocs estimate:  2478
  --------------
  minimum time:     345.571 ms (0.00% GC)
  median time:      360.031 ms (0.00% GC)
  mean time:        358.910 ms (0.70% GC)
  maximum time:     369.374 ms (4.10% GC)
  --------------
  samples:          6
  evals/sample:     1

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.07 MiB
  allocs estimate:  7121
  --------------
  minimum time:     1.597 ms (0.00% GC)
  median time:      1.743 ms (0.00% GC)
  mean time:        1.957 ms (8.67% GC)
  maximum time:     77.709 ms (96.39% GC)
  --------------
  samples:          1021
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     205.896 μs (0.00% GC)
  median time:      212.959 μs (0.00% GC)
  mean time:        247.828 μs (13.21% GC)
  maximum time:     75.570 ms (99.60% GC)
  --------------
  samples:          8002
  evals/sample:     1

Platform

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Stepping:              1
CPU MHz:               2523.984
BogoMIPS:              4401.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47

GPU:
Model:       Tesla P100-PCIE-12GB
IRQ:         74
GPU UUID:    GPU-a78e4979-19e4-4d0e-ebc7-66348ddd11b3
Video BIOS:      86.00.3a.00.02
Bus Type:    PCIe
DMA Size:    47 bits
DMA Mask:    0x7fffffffffff
Bus Location:    0000:04:00.0
Device Minor:    0
GiggleLiu commented 5 years ago

Another benchmark on Nvidia Tesla M40

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  16.70 MiB
  allocs estimate:  7278
  --------------
  minimum time:     4.713 ms (0.00% GC)
  median time:      12.068 ms (7.94% GC)
  mean time:        12.484 ms (9.53% GC)
  maximum time:     80.318 ms (91.10% GC)
  --------------
  samples:          161
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
  allocs estimate:  2478
  --------------
  minimum time:     382.711 ms (0.00% GC)
  median time:      384.631 ms (0.00% GC)
  mean time:        386.760 ms (0.65% GC)
  maximum time:     396.166 ms (3.78% GC)
  --------------
  samples:          6
  evals/sample:     1

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.07 MiB
  allocs estimate:  7121
  --------------
  minimum time:     1.620 ms (0.00% GC)
  median time:      1.674 ms (0.00% GC)
  mean time:        1.900 ms (8.67% GC)
  maximum time:     77.474 ms (96.30% GC)
  --------------
  samples:          1051
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     209.335 μs (0.00% GC)
  median time:      216.347 μs (0.00% GC)
  mean time:        251.826 μs (13.06% GC)
  maximum time:     75.510 ms (99.59% GC)
  --------------
  samples:          7876
  evals/sample:     1%

Platform


CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    1
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Stepping:              1
CPU MHz:               1200.031
BogoMIPS:              4401.55
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11
NUMA node1 CPU(s):     12-23

GPU:
Model:       Tesla M40
IRQ:         74
GPU UUID:    GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:    PCIe
DMA Size:    40 bits
DMA Mask:    0xffffffffff
Bus Location:    0000:04:00.0
Device Minor:    0
GiggleLiu commented 5 years ago

Another benchmark on Nvidia TItan V

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  16.70 MiB
  allocs estimate:  7278
  --------------
  minimum time:     4.884 ms (0.00% GC)
  median time:      6.476 ms (17.07% GC)
  mean time:        6.986 ms (18.01% GC)
  maximum time:     110.769 ms (94.64% GC)
  --------------
  samples:          286
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
   43 function iterate(qo::QCBMOptimizer, state::Int=1)
  allocs estimate:  2478
  --------------
  minimum time:     396.184 ms (0.00% GC)
  median time:      397.585 ms (0.00% GC)
  mean time:        401.956 ms (1.01% GC)
  maximum time:     418.597 ms (4.83% GC)
  --------------
  samples:          5
  evals/sample:     1

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.07 MiB
  allocs estimate:  7121
  --------------
  minimum time:     1.635 ms (0.00% GC)
  median time:      2.631 ms (0.00% GC)
  mean time:        3.069 ms (11.31% GC)
  maximum time:     114.555 ms (96.01% GC)
  --------------
  samples:          651
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     236.792 μs (0.00% GC)
  median time:      453.363 μs (0.00% GC)
  mean time:        523.526 μs (13.50% GC)
  maximum time:     109.883 ms (99.48% GC)
  --------------
  samples:          3792
  evals/sample:     1

Platform

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2499.921
CPU max MHz:           2900.0000
CPU min MHz:           1200.0000
BogoMIPS:              4401.27
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47

GPU:
Model:       TITAN V
IRQ:         98
GPU UUID:    GPU-f04d8db3-bb77-b4ee-cd2e-b666cd0fd0ea
Video BIOS:      88.00.41.00.12
Bus Type:    PCIe
DMA Size:    47 bits
DMA Mask:    0x7fffffffffff
Bus Location:    0000:04:00.0
Device Minor:    0