QuantumBFS / CuYao.jl

CUDA extension for Yao.jl
https://yaoquantum.org
Other
35 stars 8 forks source link

[Discussion] CuArrays in Julia1.5 is faster! But why? #59

Closed GiggleLiu closed 3 years ago

GiggleLiu commented 4 years ago

Julia 1.4.1

julia> using CuYao

(@v1.4) pkg> st CuYao
Status `~/.julia/environments/v1.4/Project.toml`
  [b48ca7a8] CuYao v0.2.2 [`~/.julia/dev/CuYao`]

julia> using CuArrays

julia> using BenchmarkTools

julia> reg = rand_state(25) |> cu
ArrayReg{1, Complex{Float64}, CuArray...}
    active qubits: 25/25

(@v1.4) pkg> st CuArrays
Status `~/.julia/environments/v1.4/Project.toml`
  [3a865a2d] CuArrays v2.2.0

julia> @benchmark @CuArrays.sync $reg |> $(put(25, 5=>X))
BenchmarkTools.Trial: 
  memory estimate:  3.31 KiB
  allocs estimate:  95
  --------------
  minimum time:     3.137 ms (0.00% GC)
  median time:      3.298 ms (0.00% GC)
  mean time:        3.295 ms (0.00% GC)
  maximum time:     3.439 ms (0.00% GC)
  --------------
  samples:          1507
  evals/sample:     1

julia> @benchmark @CuArrays.sync $reg |> $(cnot(25, 3, 9))
BenchmarkTools.Trial: 
  memory estimate:  3.81 KiB
  allocs estimate:  107
  --------------
  minimum time:     1.809 ms (0.00% GC)
  median time:      1.996 ms (0.00% GC)
  mean time:        2.021 ms (0.00% GC)
  maximum time:     2.305 ms (0.00% GC)
  --------------
  samples:          2466
  evals/sample:     1

julia> @benchmark @CuArrays.sync $reg |> $(put(25, 5=>Rx(0.5)))
BenchmarkTools.Trial: 
  memory estimate:  11.34 KiB
  allocs estimate:  181
  --------------
  minimum time:     3.175 ms (0.00% GC)
  median time:      3.409 ms (0.00% GC)
  mean time:        3.408 ms (0.00% GC)
  maximum time:     3.712 ms (0.00% GC)
  --------------
  samples:          1456
  evals/sample:     1

Julia1.5-beta

julia> using CuYao

julia> reg = rand_state(25) |> cu
ArrayReg{1, Complex{Float64}, CuArray...}
    active qubits: 25/25

julia> @benchmark @CuArrays.sync $reg |> $(put(25, 5=>X))
ERROR: LoadError: UndefVarError: @benchmark not defined
in expression starting at REPL[3]:1

julia> using BenchmarkTools
u[ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]
sing C
julia> using CuArrays

julia> using BenchmarkTools

julia> @benchmark @CuArrays.sync $reg |> $(put(25, 5=>X))
BenchmarkTools.Trial: 
  memory estimate:  3.61 KiB
  allocs estimate:  88
  --------------
  minimum time:     1.916 ms (0.00% GC)
  median time:      2.122 ms (0.00% GC)
  mean time:        2.114 ms (0.00% GC)
  maximum time:     7.800 ms (0.00% GC)
  --------------
  samples:          2362
  evals/sample:     1

(@v1.5) pkg> st CuYao
Status `~/.julia/environments/v1.5/Project.toml`
  [b48ca7a8] CuYao v0.2.2 `~/.julia/dev/CuYao`

(@v1.5) pkg> st CuArrays
Status `~/.julia/environments/v1.5/Project.toml`
  [3a865a2d] CuArrays v2.2.0

julia> @benchmark @CuArrays.sync $reg |> $(cnot(25, 3, 9))
BenchmarkTools.Trial: 
  memory estimate:  4.13 KiB
  allocs estimate:  100
  --------------
  minimum time:     1.054 ms (0.00% GC)
  median time:      1.091 ms (0.00% GC)
  mean time:        1.109 ms (0.00% GC)
  maximum time:     9.261 ms (0.00% GC)
  --------------
  samples:          4504
  evals/sample:     1

julia> @benchmark @CuArrays.sync $reg |> $(put(25, 5=>Rx(0.5)))
BenchmarkTools.Trial: 
  memory estimate:  11.35 KiB
  allocs estimate:  169
  --------------
  minimum time:     2.000 ms (0.00% GC)
  median time:      2.200 ms (0.00% GC)
  mean time:        2.195 ms (0.00% GC)
  maximum time:     7.010 ms (0.00% GC)
  --------------
  samples:          2275
  evals/sample:     1
maleadt commented 4 years ago

Which version of CUDAnative and GPUCompiler does this use?

GiggleLiu commented 4 years ago
(@v1.5) pkg> st CUDAnative
Status `~/.julia/environments/v1.5/Project.toml`
  [be33ccc6] CUDAnative v3.1.0

The same in Julia-1.4.1. CuYao does not depend on GPUCompiler.

Maybe the performance increase is related to the updates in Julia1.5 For example, the immutable reference (especially non-allocating view): https://docs.julialang.org/en/v1.5-dev/NEWS/#Compiler/Runtime-improvements-1 https://github.com/JuliaLang/julia/pull/34126 https://github.com/JuliaLang/julia/issues/14955

Maybe you can also run the CUDAnative benchmark again, some of them might be faster automatically.

GiggleLiu commented 4 years ago

Please also see: https://github.com/JuliaArrays/UnsafeArrays.jl/issues/8

GiggleLiu commented 4 years ago

This is really amazing, isn't it! @maleadt