QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
292 stars 137 forks source link

Linear method eigensolver improvements #3788

Open markdewing opened 2 years ago

markdewing commented 2 years ago

This issue is exploration into accelerating the eigensolver part of the linear method. See #3633 for the umbrella issue with scaling the linear method with respect to the number of parameters.

The nonsymmetric generalized eigenvalue problem scales as N_p^3, so it will eventually dominate the time in each linear method step as the number of parameters increases.

This issue is to record some of the investigations and observations.

The problem is there are multiple orthogonal directions to accelerate: multicore, GPUs, and distributed memory. As far as I know, there are no good solutions that combine all of these.

Options

The next step is to implement a Scalapack version of the generalized eigenvalue kernel and explore the scaling.

ye-luo commented 2 years ago

Why are we not using Generalized Nonsymmetric Eigensolvers instead of invert overlap matrix + dgeev?

markdewing commented 2 years ago

I think using a generalized nonsymmetric solver is slower. There is code in QMCFixedSamplingLinearOptimize{Batched} to do so - the three parameter version of getLowestEigenvector uses dggev instead.

Plus there's even fewer implementations of the generalized version.

markdewing commented 2 years ago

Scalapack complications: To get the eigenvalues, the following routines get called

Versions of both of these exist in scalapack. Getting the eigenvalues is straightforward.

The eigenvectors are where it gets more complicated.

There is no version of dorghr in scalapack, but there is a version of dorgqr. There is no version of dtrevc3 in scalapack, but there are complex versions (ztrevc) (The routines dtrevc3 and dtrevc solve the same problem, just with different algorithms)

A couple of ideas

Other notes

rcclay commented 2 years ago

What's the status of the LMY engine? It occurs to me that there's a chance that Eric might have implemented this stuff already somewhere.

Also, how crazy would it be to think about implementing one of these by hand?
https://journals.aps.org/prb/abstract/10.1103/PhysRevB.85.045103 https://aip.scitation.org/doi/full/10.1063/1.5125803

prckent commented 2 years ago

LMY engine has been untouched for a long time, but a discussion with Eric would be productive. In particular the "optimize subsets of parameters" approach is implemented there.

I recall that the output of our previous discussions were that we should get a distributed memory scalapack solver working since that will work everywhere and will greatly relieve memory pressure. @markdewing ?

It would be nice to do a proper study of optimizer performance(convergence, robustness, speed) when all the derivatives are consistently implemented.

markdewing commented 2 years ago

The LMY engine code does not work with the batched driver. At the very least, QMCCostFunctionBatched::engine_checkConfigurations is not implemented.

If the solver in those papers is the SPAM (Subspace Projected Approximation Matrix) method, then it appears to be implemented in the formic code. Though it is not enabled in the LMY engine constructor from QMCPACK.

There's also the blocked linear method, which solves for a subset of parameters. I think that method that is connected to QMCPACK.

I have written a miniapp/kernel for the linear method solver ( https://github.com/markdewing/qmc_kernels/tree/master/kernels/generalized_eigensolve ) that is useful for exploring the solver portion in isolation.

prckent commented 2 years ago

Painful as it is to note, LMY engine comes with a lot of baggage as well as its better algorithms. I think a clean implementation of any new algorithms without LMY is the preferred option by far. There much to be gained by simply having fewer lines of code for optimization and in the repo overall.

markdewing commented 5 months ago

The linear method only needs one eigenvector. Often there are a number of extremely low eigenvalues that are clearly not desirable, so the method has a step that looks through the eigenvalues for the one closest to E_0 (see #4917 for more discussion on this step). It would be useful to take advantage of this rather than solve for all the eigenvalues and eigenvectors. ARPACK ( https://github.com/opencollab/arpack-ng ) is an iterative eigensolver which has a shift-and-invert mode that finds the eigenvalues closest to a target value. This mode can be used in the linear method.

The python script in qmc_kernels has been updated with an ARPACK version ( qmcpack_linear_method_solver.py ) and there is a C++ version in the arpack subdirectory.

Some timings (CPU, 28 core Broadwell)

Matrix size LAPACK time (s) ARPACK time (s)
3055 17.4 1.5
6855 124.8 7.5

The rot_det_speedup branch contains a prototype implementation of the linear method using ARPACK.