Define ChainRule for runcircuit (w/o noise)

PhilipVinc commented 2 years ago

This would make working with noise and AD easy.

The runcircuit w/o noise should be easy to write. You essentially ahve an implementation inside of /src/autodiff.jl

function rrule(::typeof(rayleigh_quotient), O::MPO, U::Vector{ITensor}, ψ::MPS; kwargs...)
  ϕl = runcircuit(ψ, U; kwargs...) 
  ϕl = noprime(*(O, ϕl'; kwargs...))

  Udag = reverse([dag(swapprime(u, 0=>1)) for u in U])
  ξ⃗ = MPS[ϕl]
  for udag in Udag
    ξ⃗  = vcat(ξ⃗, apply(udag, ξ⃗[end]; move_sites_back = true, kwargs...))
  end
  ξ⃗l = reverse(ξ⃗)
  y = real(inner(ξ⃗l[1], ψ))
  ...
  return y, rayleigh_quotient_pullback
end

i'm unsure how that should be written with noise. I'm not so familiar with ITensor's notation..

mtfishman commented 2 years ago

Hi @PhilipVinc,

That's a good point, we could write an rrule directly for runcircuit. Perhaps in that way we could avoid the need for writing an explicit rrule for rayleigh_quotient and make it easier for people to use more general cost functions. We would just need to make sure we can write the rule so that it is just as efficient as the one we would write for rayleigh_quotient directly.

In terms of the noisy circuit evolution, @GTorlai and I have discussed it and we have a plan for the noisy circuit rayleigh_quotient rrule, which should translate over to runcircuit. It should mostly just look like the noiseless circuit rule, but you have to be a bit careful about order of operations so the contractions are efficient, since the noisy circuit has nonlocal contractions of the Krauss dimensions. In general, we've written runcircuit to be essentially generic to whether or not there is noise.

Cheers, Matt

PhilipVinc commented 2 years ago

Hi @mtfishman , thanks for the answer.

We would just need to make sure we can write the rule so that it is just as efficient as the one we would write for rayleigh_quotient directly.

You also can have both... and remind users to use reyleigh_quotient for best performance. (Though, right now reyleigh_quotient does not work with noise, which is why i was using directly runcircuit).

In terms of the noisy circuit evolution, @GTorlai and I have discussed it we have a plan for the noisy circuit rayleigh_quotient rrule, which should translate over to runcircui

I'd be happy to know when you've got a sketch of this working. With @StefanoBarison we are trying to benchmark a few things and this would speed up our calculations enourmously

mtfishman commented 2 years ago

I believe I've worked out that the rrule for runcircuit directly can be done efficiently (with and without noise), so that would be a good way to go. Agreed we could have definitions for both runcircuit and rayleigh_quotient, but ideally we would avoid code duplication.

Sounds good, glad to hear there is interest in this. We will keep you posted.

PhilipVinc commented 2 years ago

but ideally we would avoid code duplication.

If they are the same you can just call a sub-function. In general, I'd like julia code to be wholly differentiable at the top level. If a function is exported and accessible by users, it should be differentiable in some form. Otherwise, how I am to know that one is differentiable and the other ain't? You can document that one might be less efficient, but ideally it should be.

mtfishman commented 2 years ago

Right, but the goal of an AD library is have a sweet spot, where you define a minimal set of rules and the AD package like Zygote automatically composes them so that a wider range of functionality is differentiable. We would not explicitly make differentiation rules for every single user facing function, that would be impractical and essentially negate a big reason for using AD in the first place.

I'm just proposing the idea that hopefully by making runcircuit differentiable, functions like rayleigh_quotient will be differentiable automatically, since rayleigh_quotient is just a simple wrapper around runcircuit and inner so AD libraries like Zygote should be able to compute it's derivative automatically.

Essentially there is not much point in even providing rayleigh_quotient since it is so easy to write in terms of runcircuit and inner, we mostly used it as a convenient function for writing an AD rule to use in cost functions for VQE-like applications.

mtfishman commented 2 years ago

With regard to:

In general, I'd like julia code to be wholly differentiable at the top level. If a function is exported and accessible by users, it should be differentiable in some form. Otherwise, how I am to know that one is differentiable and the other ain't? You can document that one might be less efficient, but ideally it should be.

I think everyone agrees with that notion, and we consider it a bug if something is not differentiable in PastaQ.jl and ITensors.jl, but unfortunately our manpower and time is very limited and you will find many of these sorts of bugs right now, as we work through things one by one. You'll know if it is not differentiable if it throws an error, and you can raise an issue about it or better yet make a pull request with an rrule.

PhilipVinc commented 2 years ago

Sorry! I think i misuderstood what you meant in your previous message! Of course I agree 100% with what you are saying

mtfishman commented 2 years ago

Sorry! I think i misuderstood what you meant in your previous message! Of course I agree 100% with what you are saying

No worries, communicating over the internet is imperfect. Glad we're on the same page. In the end, of course we might have to make a specialized rayleigh_quotient rule for efficiency, but hopefully not. Giacomo says he is close to getting an rrule working with the noisy circuit version of rayleigh_quotient and we will discuss how it might work to just make a rule for runcircuit instead.

Also, for context, we are only in the very early stages of adding differentiation support for PastaQ.jl and ITensors.jl. Because of the unique needs of the ITensor indexing interface and tensor network computations in general, many rules have to be written by hand to make things work, so it is a bit of an undertaking. For example, for performance it is very important to account for order of operations of tensor network contractions and caching across tensor network contractions, which are nontrivial tasks in general. Additionally, we can't just blindly rely on rules for linear algebra to help us out, since often our calls to matrix multiplication or matrix decompositions are buried beneath many layers of non-differentiable abstractions. So I hope it didn't come off as snarky when I said that we want people to raise issues and help out, any help with this would be greatly appreciated.

GTorlai / PastaQ.jl

Define ChainRule for runcircuit (w/o noise) #257