Adjusting example 3 to a GPU backend

YotamKa commented 5 months ago

Dear Matt,

I tried using Nvidia's GPU backend for ITensorMPS TDVP calculations, but encountered an issue with the data types being passed to the ODE solver.

The problem is described below using code snippets.

Do you have a quick fix in mind for this issue?

Thanks, Yotam \ \ Consider, for example, the code in: https://github.com/ITensor/ITensorMPS.jl/tree/main/examples/03_tdvp_time_dependent.jl'

I changed it a bit so it would run with my current versions: "ITensors v0.3.68" "ITensorMPS v0.2.1"

The following code runs:

using ITensors
using ITensorMPS
using LinearAlgebra: norm
using Random: Random

include("03_models.jl")
include("03_updaters.jl")

function main()
  Random.seed!(1234)

  # Time dependent Hamiltonian is:
  # H(t) = H₁(t) + H₂(t) + …
  #      = f₁(t) H₁(0) + f₂(t) H₂(0) + …
  #      = cos(ω₁t) H₁(0) + cos(ω₂t) H₂(0) + …

  # Number of sites
  n = 6

  # How much information to output from TDVP
  # Set to 2 to get information about each bond/site
  # evolution, and 3 to get information about the
  # updater.
  outputlevel = 3

  # Frequency of time dependent terms
  ω₁ = 0.1
  ω₂ = 0.2

  # Nearest and next-nearest neighbor
  # Heisenberg couplings.
  J₁ = 1.0
  J₂ = 1.0

  time_step = 0.1
  time_stop = 1.0

  # nsite-update TDVP
  nsite = 2

  # Starting state bond/link dimension.
  # A product state starting state can
  # cause issues for TDVP without
  # subspace expansion.
  start_linkdim = 4

  # TDVP truncation parameters
  maxdim = 100
  cutoff = 1e-8

  tol = 1e-15

  @show n
  @show ω₁, ω₂
  @show J₁, J₂
  @show maxdim, cutoff, nsite
  @show start_linkdim
  @show time_step, time_stop

  ω⃗ = (ω₁, ω₂)
  f⃗ = map(ω -> (t -> cos(ω * t)), ω⃗)

  # H₀ = H(0) = H₁(0) + H₂(0) + …
  ℋ₁₀ = heisenberg(n; J=J₁, J2=0.0)
  ℋ₂₀ = heisenberg(n; J=0.0, J2=J₂)
  ℋ⃗₀ = (ℋ₁₀, ℋ₂₀)

  s = siteinds("S=1/2", n)

  H⃗₀ = map(ℋ₀ -> MPO(ℋ₀, s), ℋ⃗₀)

  # Initial state, ψ₀ = ψ(0)
  # Initialize as complex since that is what OrdinaryDiffEq.jl/DifferentialEquations.jl
  # expects.
  ψ₀ = complex.(randomMPS(s, j -> isodd(j) ? "↑" : "↓"; linkdims=start_linkdim))

  @show norm(ψ₀)

  println()
  println("#"^100)
  println("Running TDVP with ODE updater")
  println("#"^100)
  println()

  ψₜ_ode = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=ode_updater,
    updater_kwargs=(; reltol=tol, abstol=tol),
    time_step,
    maxdim,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with ODE updater")
  println()

  println()
  println("#"^100)
  println("Running TDVP with Krylov updater")
  println("#"^100)
  println()

  ψₜ_krylov = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=krylov_updater,
    updater_kwargs=(; tol, eager=true),
    time_step,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with Krylov updater")
  println()

  println()
  println("#"^100)
  println("Running full state evolution with ODE updater")
  println("#"^100)
  println()

  @disable_warn_order begin
    ψₜ_full, _ = ode_updater(
      -im * TimeDependentSum(f⃗, contract.(H⃗₀)),
      contract(ψ₀);
      internal_kwargs=(; time_step=time_stop, outputlevel),
      reltol=tol,
      abstol=tol,
    )
  end

  println()
  println("Finished full state evolution with ODE updater")
  println()

  @show norm(ψₜ_ode)
  @show norm(ψₜ_krylov)
  @show norm(ψₜ_full)

  @show 1 - abs(inner(contract(ψₜ_ode), ψₜ_full))
  @show 1 - abs(inner(contract(ψₜ_krylov), ψₜ_full))
  return nothing
end

main()

But, when I pass psi and H as cuMPS and cuMPO, i.e.:

using ITensors
using ITensorMPS
using LinearAlgebra: norm
using Random: Random
using CUDA

include("03_models.jl")
include("03_updaters.jl")

function main()
  Random.seed!(1234)

  # Time dependent Hamiltonian is:
  # H(t) = H₁(t) + H₂(t) + …
  #      = f₁(t) H₁(0) + f₂(t) H₂(0) + …
  #      = cos(ω₁t) H₁(0) + cos(ω₂t) H₂(0) + …

  # Number of sites
  n = 6

  # How much information to output from TDVP
  # Set to 2 to get information about each bond/site
  # evolution, and 3 to get information about the
  # updater.
  outputlevel = 3

  # Frequency of time dependent terms
  ω₁ = 0.1
  ω₂ = 0.2

  # Nearest and next-nearest neighbor
  # Heisenberg couplings.
  J₁ = 1.0
  J₂ = 1.0

  time_step = 0.1
  time_stop = 1.0

  # nsite-update TDVP
  nsite = 2

  # Starting state bond/link dimension.
  # A product state starting state can
  # cause issues for TDVP without
  # subspace expansion.
  start_linkdim = 4

  # TDVP truncation parameters
  maxdim = 100
  cutoff = 1e-8

  tol = 1e-15

  @show n
  @show ω₁, ω₂
  @show J₁, J₂
  @show maxdim, cutoff, nsite
  @show start_linkdim
  @show time_step, time_stop

  ω⃗ = (ω₁, ω₂)
  f⃗ = map(ω -> (t -> cos(ω * t)), ω⃗)

  # H₀ = H(0) = H₁(0) + H₂(0) + …
  ℋ₁₀ = heisenberg(n; J=J₁, J2=0.0)
  ℋ₂₀ = heisenberg(n; J=0.0, J2=J₂)
  ℋ⃗₀ = (ℋ₁₀, ℋ₂₀)

  s = siteinds("S=1/2", n)

""" Define as H as cu(MPO) for GPU calculation """
  H⃗₀ = map(ℋ₀ -> cu(MPO(ℋ₀, s)), ℋ⃗₀)

  # Initial state, ψ₀ = ψ(0)
  # Initialize as complex since that is what OrdinaryDiffEq.jl/DifferentialEquations.jl
  # expects.
""" Define as psi as cu(MPS) for GPU calculation """
  ψ₀ = cu(complex.(randomMPS(s, j -> isodd(j) ? "↑" : "↓"; linkdims=start_linkdim)))

  @show norm(ψ₀)

  println()
  println("#"^100)
  println("Running TDVP with ODE updater")
  println("#"^100)
  println()

  ψₜ_ode = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=ode_updater,
    updater_kwargs=(; reltol=tol, abstol=tol),
    time_step,
    maxdim,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with ODE updater")
  println()

  println()
  println("#"^100)
  println("Running TDVP with Krylov updater")
  println("#"^100)
  println()

  ψₜ_krylov = tdvp(
    -im * TimeDependentSum(f⃗, H⃗₀),
    time_stop,
    ψ₀;
    updater=krylov_updater,
    updater_kwargs=(; tol, eager=true),
    time_step,
    cutoff,
    nsite,
    outputlevel,
  )

  println()
  println("Finished running TDVP with Krylov updater")
  println()

  println()
  println("#"^100)
  println("Running full state evolution with ODE updater")
  println("#"^100)
  println()

  @disable_warn_order begin
    ψₜ_full, _ = ode_updater(
      -im * TimeDependentSum(f⃗, contract.(H⃗₀)),
      contract(ψ₀);
      internal_kwargs=(; time_step=time_stop, outputlevel),
      reltol=tol,
      abstol=tol,
    )
  end

  println()
  println("Finished full state evolution with ODE updater")
  println()

  @show norm(ψₜ_ode)
  @show norm(ψₜ_krylov)
  @show norm(ψₜ_full)

  @show 1 - abs(inner(contract(ψₜ_ode), ψₜ_full))
  @show 1 - abs(inner(contract(ψₜ_krylov), ψₜ_full))
  return nothing
end

main()

I get the following error:

MethodError: no method matching (::var"#f#7"{…})(::CuArray{…}, ::SciMLBase.NullParameters, ::Float64)
An arithmetic operation was performed on a NullParameters object. This means no parameters were passed
into the AbstractSciMLProblem (e.x.: ODEProblem) but the parameters object `p` was used in an arithmetic
expression.

emstoudenmire commented 5 months ago

Do you get the same error when running the calculation on CPU ie not using CUDA? I ask because the error message seems more related to SciML parameters than the presence of the CuArray though I could be wrong.

YotamKa commented 5 months ago

Hi, thanks for the quick response.

No, with a CPU backend (the first code snippet), the integrator works and produces the correct answers (compared with ED).

The output is:

n = 4
(ω₁, ω₂) = (0.1, 0.2)
(J₁, J₂) = (1.0, 1.0)
(maxdim, cutoff, nsite) = (100, 1.0e-8, 2)
start_linkdim = 4
(time_step, time_stop) = (0.1, 1.0)
norm(ψ₀) = 1.0000000000000002

####################################################################################################
Running TDVP with ODE updater
####################################################################################################

After sweep 1: maxlinkdim=4 maxerr=0.00E+00 current_time=0.1 time=0.701
After sweep 2: maxlinkdim=4 maxerr=8.05E-17 current_time=0.2 time=0.028
After sweep 3: maxlinkdim=4 maxerr=3.34E-17 current_time=0.3 time=0.025
After sweep 4: maxlinkdim=4 maxerr=0.00E+00 current_time=0.4 time=0.025
After sweep 5: maxlinkdim=4 maxerr=6.16E-17 current_time=0.5 time=0.023
After sweep 6: maxlinkdim=4 maxerr=2.07E-17 current_time=0.6 time=0.025
After sweep 7: maxlinkdim=4 maxerr=0.00E+00 current_time=0.7 time=0.023
After sweep 8: maxlinkdim=4 maxerr=0.00E+00 current_time=0.8 time=0.025
After sweep 9: maxlinkdim=4 maxerr=0.00E+00 current_time=0.9 time=0.023
After sweep 10: maxlinkdim=4 maxerr=0.00E+00 current_time=1.0 time=0.025

Finished running TDVP with ODE updater

####################################################################################################
Running TDVP with Krylov updater
####################################################################################################

After sweep 1-: maxlinkdim=4 maxerr=0.00E+00 current_time=0.1 time=5.487
After sweep 2: maxlinkdim=4 maxerr=2.14E-16 current_time=0.2 time=0.004
After sweep 3: maxlinkdim=4 maxerr=0.00E+00 current_time=0.3 time=0.004
After sweep 4: maxlinkdim=4 maxerr=3.33E-17 current_time=0.4 time=0.004
After sweep 5: maxlinkdim=4 maxerr=6.02E-18 current_time=0.5 time=0.004
After sweep 6: maxlinkdim=4 maxerr=7.39E-17 current_time=0.6 time=0.004
After sweep 7: maxlinkdim=4 maxerr=9.73E-17 current_time=0.7 time=0.004
After sweep 8: maxlinkdim=4 maxerr=0.00E+00 current_time=0.8 time=0.004
After sweep 9: maxlinkdim=4 maxerr=6.92E-17 current_time=0.9 time=0.006
After sweep 10: maxlinkdim=4 maxerr=0.00E+00 current_time=1.0 time=0.006

Finished running TDVP with Krylov updater

####################################################################################################
Running full state evolution with ODE updater
####################################################################################################

Finished full state evolution with ODE updater

norm(ψₜ_ode) = 1.0000000000000007
norm(ψₜ_krylov) = 1.0000000000000022
norm(ψₜ_full) = 0.9999999999999999
1 - abs(inner(contract(ψₜ_ode), ψₜ_full)) = -2.220446049250313e-16
1 - abs(inner(contract(ψₜ_krylov), ψₜ_full)) = 9.623815895309917e-7

mtfishman commented 5 months ago

Interesting, thanks for the report. Maybe worth looking at https://github.com/SciML/DiffEqGPU.jl?tab=readme-ov-file#example-of-within-method-gpu-parallelism. The only difference I see in that example from our example is that we aren't defining an in-place version of the solver time stepping function: https://github.com/ITensor/ITensorMPS.jl/blob/v0.2.3/examples/03_updaters.jl#L11-L14, or there is something more subtle about how we are calling the ODE solver that I am missing.

mtfishman commented 5 months ago

Also note that if you are using GPU, probably you want to use single precision (i.e. Float32 or Complex{Float32}). cu(...) converts to single precision automatically but you should probably change parameters in the code to single precision, for example ω₁ = 0.1f0, tol = 1f-7, etc. Something we do to easily switch between number types is define the number type at the top of the code and then use it as a variable throughout the code:

elt = Float32

ω₁ = one(elt) / 10
tol = 10 * eps(elt)

but of course feel free to handle that however you prefer.

mtfishman commented 5 months ago

I see the issue, this line here is too restrictive: https://github.com/ITensor/ITensorMPS.jl/blob/v0.2.3/examples/03_updaters.jl#L12 and should allow any AbstractVector rather than just a Vector. I'll make a PR, and additionally make the example easier to run on GPU and select the element type.

mtfishman commented 5 months ago

This is being fixed in #22.

YotamKa commented 5 months ago

Thanks a lot! It works also on my PC over a GPU backend.

ITensor / ITensorMPS.jl

Adjusting example 3 to a GPU backend #21