Reduce allocations and run times

hmunozb commented 2 years ago

The use of a .EIGS member to diagonalize a dense Hamiltonian instead of an interface function is more bulky than necessary in time and allocations.

(Very rough) test of this with a 4 qubit system and a few different schedules suggest up to a ~25% improvement in the performance of the AME

[master 937a3236f1b9578bc223b21817dec2e7a8512ee2]
Warming up ...
  7.523293 seconds (19.89 M allocations: 1.046 GiB, 4.44% gc time, 99.45% compilation time)
  0.048794 seconds (134.67 k allocations: 22.224 MiB, 68.96% compilation time)
  0.056558 seconds (136.75 k allocations: 22.732 MiB, 73.10% compilation time)
Running ...3.162
  0.014409 seconds (69.40 k allocations: 16.847 MiB)
  0.014739 seconds (77.84 k allocations: 18.921 MiB)
  0.043694 seconds (79.92 k allocations: 19.429 MiB, 59.29% gc time)
Running ...316.230
  0.398527 seconds (2.71 M allocations: 659.962 MiB, 26.77% gc time)
  0.404000 seconds (2.79 M allocations: 679.788 MiB, 26.13% gc time)
  0.435406 seconds (2.82 M allocations: 686.737 MiB, 26.39% gc time)

[commit 30e75db8bb439475cfa2757a695291f0ecd7f76a]
Warming up ...
  8.250285 seconds (21.26 M allocations: 1.123 GiB, 4.19% gc time, 99.57% compilation time)
  0.035314 seconds (82.01 k allocations: 12.020 MiB, 77.95% compilation time)
  0.037924 seconds (82.71 k allocations: 12.263 MiB, 79.54% compilation time)
Running ...3.162
  0.006695 seconds (22.20 k allocations: 7.703 MiB)
  0.009089 seconds (25.18 k allocations: 8.717 MiB)
  0.012318 seconds (25.89 k allocations: 8.960 MiB)
Running ...316.230
  0.296447 seconds (864.41 k allocations: 301.489 MiB, 22.15% gc time)
  0.299073 seconds (899.39 k allocations: 312.833 MiB, 13.81% gc time)
  0.320938 seconds (908.75 k allocations: 316.072 MiB, 19.29% gc time)

This is simply with this replacement in diffeq_liouvillian.jl

function (Op::DiffEqLiouvillian{true,false})(du, u, p, t)
    s = p(t)
    #w, v = Op.H.EIGS(Op.H, s, Op.lvl)
    w, v = haml_eigs(Op.H, s, lvl=Op.lvl)
# ....
end

Some additional optimizations also look possible by making DiffEqLiouvillian and other other objects type-generic, e.g.

struct DiffEqLiouvillian{diagonalization,adiabatic_frame, Htype <: AbstractHamiltonian}
    "Hamiltonian"
    H::Htype
   # ...
end

neversakura commented 2 years ago

This new interface does not seem to allow users to easily define their own eigen decomposition routines for existing types like https://uscqserver.github.io/HOQSTTutorials.jl/html/hamiltonian/01-custom_eigen.html.

For the dense Hamiltonian, the only downside is that we cannot define a fix set of eigen values/vectors if the Hamiltonian is constant. However, for the sparse Hamiltonian (or other types), we will need to provide a robust and reliable routine for the generic case. I am not sure if there exist such a library.

hmunozb commented 2 years ago

I think that the case where a custom eigen routine is needed (and not addressed by an existing Hamiltonian type) might be better addressed by a user-defined implementation of AbstractHamiltonian. It is potentially a little more boilerplate, but you could for example wrap a DenseHamiltonian into a new Hamiltonian type and override the haml_eigs interface. Just to illustrate,


#### My custom two-qubit Hamiltonian 

const sui=SMatrix{4,4,ComplexF64,16}((σx+σy)⊗σi)/sqrt(2.0)
const siu=SMatrix{4,4,ComplexF64,16}(σi⊗(σx+σy))/sqrt(2.0)
const szi=SMatrix{4,4,ComplexF64,16}(σz⊗σi)
const siz=SMatrix{4,4,ComplexF64,16}(σi⊗σz)
const szz=SMatrix{4,4,ComplexF64,16}(σz⊗σz)

struct My2QHamiltonian{S} <: AbstractHamiltonian{Float64}
    sched::S
    s0::Float64
    hx::Float64
    hz::Float64
    H::DenseHamiltonian
end

function My2QHamiltonian(sched, s0, hx, hz)
    H = DenseHamiltonian(
        [ (s)-> hx*(s0-sched(s)), (s)-> 0.8*hx*(s0-sched(s)), 
            (s)-> 0.4*hz*sched(s),  (s)-> 0.6*hz*sched(s), 
            (s)-> -1.0*hz*sched(s) ],
        [ sui, siu, szi, siz, szz]; unit=:ħ
    )
    return My2QHamiltonian(sched, s0, hx, hz, H)
end

Base.size(H::My2QHamiltonian) = (4, 4)
Base.size(H::My2QHamiltonian, dim::T) where {T <: Integer} = 4
OpenQuantumBase.get_cache(H::My2QHamiltonian) = H.H.u_cache

function (H::My2QHamiltonian)(s::Real)
    return Matrix(H.H(s))
end

function OpenQuantumBase.haml_eigs(H::My2QHamiltonian, t; lvl::Union{Int,Nothing}=nothing)
    w,v = eigen(Hermitian(H(t)))
    # or some other eigen decomposition
    return real(w), v
end

I imagine that removing this level of indirection for the eigendecomposition would also improve the performance of code with custom eigen routines as well as the default one.

neversakura commented 2 years ago

I agree. I think we can also define a new type of constant Hamiltonian if there is a need for it. Let me rewrite the tutorial and then merge the pull request.

hmunozb commented 2 years ago

Actually, if only the eigen routine needs to be overwritten with a user function while keeping all other behavior of Dense/SparseHamiltonian the same, then perhaps we can also include a wrapper class just for that. This would look something like this

struct CustomEigHamiltonian{T<:Number, Htype<:AbstractHamiltonian{T}} <: AbstractHamiltonian{T}
    "Wrapped Hamiltonian "
    H::Htype
    "Custom eigen decomposition routine"
    EIGS::Any
end

function CustomEigHamiltonian(H::Htype ;  EIGS = EIGEN_DEFAULT) where { T<:Number, Htype<:AbstractHamiltonian{T}}
    EIGS = EIGS(H.u_cache)
    CustomEigHamiltonian{T, Htype}(H, EIGS)
end

function haml_eigs(H::CustomEigHamiltonian, t; lvl::Union{Int,Nothing}=nothing)
    return H.EIGS(H, t, lvl)
end
# etc...

neversakura commented 2 years ago

Merged pull request. Several TODOs before closing the issue:

[ ] New tutorials
[ ] Documentation

Open new issues for tutorials and documentations #81

USCqserver / OpenQuantumBase.jl

Reduce allocations and run times #78