Need (?) for warning when computing multiple excited states per momentum

Gertian commented 6 months ago

When computing quasi-particle excitations a krylov algorithm is used to solve the eigenvalue problem.

According to the documentation of KrylovKit it is therefore quite risky to ask for multiple excitatations per momentum in the case that the spectrum is degenerate.

Degenerate eigenvalues

From a theoretical point of view, Krylov methods can at most find a single eigenvector associated with a targetted eigenvalue, even if the latter is degenerate. In the case of a degenerate eigenvalue, the specific eigenvector that is returned is determined by the starting vector x₀. For large problems, this turns out to be less of an issue in practice, as often a second linearly independent eigenvector is generated out of the numerical noise resulting from the orthogonalisation steps in the Lanczos or Arnoldi iteration. Nonetheless, it is important to take this into account and to try not to depend on this potentially fragile behaviour, especially for smaller problems.

As far as I know this is not mentioned in the documentation of MPSKit and on top of this it might be useful to use @warn to inform the user about this.

A more practical question : If I'd be asking for the two lowest quasiparticles which I suspect to have degenerate eigenvalues. How would this affect performance ? Could this be incredibly slow ? What exactly is ment with

generated out of numerical noise ?

and when is this expected to fail ? What happens if it does ?

Gertian commented 6 months ago

It also seems as if the verbosity argument of KrylovKit isn't propagated into MPSKit right ?

maartenvd commented 6 months ago

Failure means that it will not find the second degenerate eigenvector, and instead find the one above it. If you suspect the system to have a degenerate eigenvalue then you could probably run the eigensolver, take a random vector and project out the already found eigenvector, and run the eigensolver again with this new initial state.

If the system is indeed degenerate, then you expect both eigensolver calls to find the same eigenvalue but with two different associated eigenvectors. If the system is not degenerate, then the second eigensolver call should either find the same eigenvalue with the same eigenvector, or it could even fail to find that eigenvalue (since you projected out the exact eigenvector).

Op wo 7 feb 2024 om 18:11 schreef Gertian @.***>:

It also seems as if the verbosity argument of KrylovKit isn't propagated into MPSKit right ?

— Reply to this email directly, view it on GitHub https://github.com/maartenvd/MPSKit.jl/issues/111#issuecomment-1932508158, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJKVCQU5NEY2AGDGTHOW33YSOYUPAVCNFSM6AAAAABC6FPS5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGUYDQMJVHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Gertian commented 6 months ago

Ok, that sounds like a good solution to my problem.

Out of curiousity, how is performance expected to be influenced in case I try to run a single eigensolver with num=2 instead of two separate ones ? I'm asking because this is what we did and we find huge runtimes on a very beefy computer. So I'm trying to understand how we could potentially lower these a bit without drastically sacrificing accuracy.

Obviously we'll start using num=1 but what else might be improved ?

Jutho commented 6 months ago

In general, asking for 2 instead of 1 eigenvector should not increase the runtime by a huge amount. Yes, it may require some more iterations of the Krylov algorithm before the second eigenvector is converged, but that should increase the runtime with maybe a factor of two (could be worse if the second one is for some reason tougher to converge, but I mean, I wouldn't expect a factor of 10 or 100 increase).

But if the lowest one is degenerate, it might be that the second one is truly a higher energy state and not a second eigenvector within the degenerate subspace. This cannot really be predicted. Is there a symmetry that tells you that you expect degeneracy? Accidental degeneracies (not explained by symmetry) are not that common.

Gertian commented 6 months ago

I've timed these things a little bit for low bond dimensions where everything still runs within reasonable time and indeed I find barely any impact on the performance. (Here I don't expect any degeneracy either though)

Physically this degeneracy stems from the (suspected) existence of gapless scatter states at the same momenta as the original excitation. To make matters worse our Hamiltonian doesn't have particle number conservation so that there are no quantum numbers distinguishing these scatter states from the "single particle" state.

Jutho commented 6 months ago

If everything is gapless, then indeed things are harder. But I would then not expect perfect degeneracies in the quasiparticle Hamiltonian (since it cannot capture scattering states as well as true single-particle states). This should be sufficient for the Krylov method to resolve the individual states, and not having to worry about the degeneracy. But this might indeed require a significantly larger number of iterations.

lkdvos commented 6 months ago

I am not sure this is something we really should warn for in MPSKit. I wouldn't mind mentioning it in the documentation, but I don't think spamming warnings every time someone asks for multiple eigenvectors is the way to go. In particular, I have never really run into issues myself, and I think for the bulk of the cases Jutho's explanation seems to hold, and the eigensolver works just fine. That being said, I think there definitely are some performance things we can look into, but Krylov methods inherently struggle with gapless systems, so it might also just be that you are just posing a very hard question. If you can run a profiler we could examine in a bit more detail where the problem lies.

In particular, some things that come to mind:

I seem to recall that computing the environments can be done either with BiCGStab, or with gmres, and this also affects performance. Additionally, if your tolerances are too strict, as these algorithms don't check for stagnation, the environments might converge to 1e-9, stagnate, and then spend 1000 iterations just doing nothing.
I also think there are some superfluous computations in the code, so if you can point me at your bottle neck I am definitely willing to look into optimizing a bit further
...

Gertian commented 6 months ago

@Jutho, Ok that's good news we are now starting to redo our runs with only one excitation asked. Let's hope this improves runtime... Worst case if it doesn't help, how hard would it be make intermediary saves of the progress made within quasiparticle ? Obviously we could store the environments once they have been computed, but apart from this, how feasible is it to save the Krylovspace that's already been built ?

@lkdvos After the above discussion I agree that a warning would be overkill. Nevertheless a mention to this potential behavior in the docs would probably be usefull.

Since I'm by now convinced that I shouldn't be asking for more than one excitation I'll run this whilst profiling and update you on the outcome.

Thank you all for the valuable insights already !

Gertian commented 6 months ago

@lkdvos, this QP_profiled.zip is a QP run on my 4 thread laptop. It does seem to spend a whole lot of time waiting... Let me know if there's anything else that might be helpful to look at.

In case you don't know this, after extracting the file you can render it trough :

using ProfileView, FileIO
data = load("QP_profiled.jlprof")
ProfileView.view(data[1], lidict=data[2])

All the best, Gertian

lkdvos commented 6 months ago

Maybe also a hint of the script you ran? Did you use MKL? How many BLAS threads are you using? What are the MPSKit multithreading settings at?

Gertian commented 6 months ago

That's a fair point. I used julia --threads=4 to set the multithreading. Apart from that everything is at the default settings which is potentially not the best idea ?

The profiled command was : Es, Bs = excitations(Ham, QuasiparticleAnsatz(krylovdim = 30, toler = sinval*extra_accuracy), curr_p, state ; sector = sector, num=1, solver = KrylovKit.GMRES{KrylovKit.ModifiedGramSchmidt2, Float64}(KrylovKit.ModifiedGramSchmidt2(), 100, 30, sinval*extra_accuracy, 0)) where sector is Irrep[U₁](3) ⊠ Irrep[U₁](0)

Gertian commented 6 months ago

From experimenting with @lkdvos his code it seems that the algorithm really benefits from the parallel computation of the environments. Most waiting time from the previous profiling run is now gone !

It seems that the only question remaining is how many threads to distribute to MPSKit (this is trough julia --threads=... right ? ) and MKL with BLAS.set_num_threads(...)

Gertian commented 6 months ago

It turns out that our tensors mostly have many different sectors. For our largest GS we have that the singular value decomposition of C consists of 170 sectors with average size order 20. The biggest sector has a degeneracy of only 120.

From this I'd expect BLAS.set_num_threads(..) should probably be 1 or 2 when working with MKL ?

lkdvos commented 6 months ago

As currently TensorKit does not actually multithread over different blocks, it should not matter too much. I think for your case, It's probably best to just count the maximal number of julia threads you can meaningfully use:

For the environments, both happen at the same time: 2 threads
- within transfer operations, in principle you can distribute threads over the amount of entries within the MPOHamiltonian. I am not sure how much this helps
For the effective hamiltonian, you can multithread over all sites: N threads
- In principle again you can distribute threads over the amount of entries in the Hamiltonian, but I am also not sure on the current effectiveness.

TensorKit itself does not currently support threading over different symmetry blocks in mul! (yet), but I think there is some preliminary support for permuting in a multithreaded way. Strided is used for permutations and benefits from additional threads BLAS is used for mul!, but also for qr, svd, ... In particular, this has highly optimized implementations of the multithreading approaches, so can be expected to use these to quite high efficiency.

Finally, if you would need data for multiple momenta, you could also consider multithreading over those: parallel=true kwarg in excitations. This again uses up julia threads, and should be a very easy way to get as many cores as possible working.

Gertian commented 6 months ago

Thanks a lot, I'll play around and update this post in case we come across any particularly effective configurations.

lkdvos commented 6 months ago

I will close this issue here, we can open a new discussion about this if needed.

QuantumKitHub / MPSKit.jl

Need (?) for warning when computing multiple excited states per momentum #111