dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
899 stars 256 forks source link

new feature: add LOBPCG to PCA and truncated_svd decompositions #364

Open lobpcg opened 6 years ago

lobpcg commented 6 years ago

Description

The codes https://github.com/dask/dask-ml/blob/master/dask_ml/decomposition/pca.py and https://github.com/dask/dask-ml/blob/master/dask_ml/decomposition/truncated_svd.py supports only

PCA:

svd_solver : string {'auto', 'full', 'tsqr', 'randomized'}

truncated_svd:

if self.algorithm not in {"tsqr", "randomized"}: raise ValueError()

while LOBPCG https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html may be expected to outperform all those solvers for large problems, e.g., see comments at https://www.mathworks.com/matlabcentral/fileexchange/48-lobpcg-m

TomAugspurger commented 6 years ago

The tsqr and randomized solvers are in dask.array, so it would need to be implemented there.

Do you know if ther's a blockwise algorithm for lobpcg?

lobpcg commented 6 years ago

What do you mean by "blockwise"?

LOBPCG implementation in SciPy closely follows its MATLAB implementation and:

A. V. Knyazev, I. Lashuk, M. E. Argentati, and E. Ovchinnikov, Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc, SIAM Journal on Scientific Computing 25(5): 2224-2239 (2007) DOI

A. V. Knyazev, Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, SIAM Journal on Scientific Computing 23(2): 517-541 (2001) DOI
lobpcg commented 6 years ago

The LOBPCG algorithm is matrix-free, requiring only a matrix-vector multiplication function, e.g., to compute partial SVD and PCA for a data matrix X without ever computing its covariance matrix X'X, LOBPCG calls a function of the product X'(Xv) of the covariance matrix X'X and the block-vector v. The matrix X thus can be stored in any format, e.g., dask.array, allowing only to evaluate the product X'(Xv). The https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html implementation allows this function-call already.

Yet another blockwise improvement need to be coded. LOBPCG is a block method, meaning that it computes several eigen/principal components together in a block-vector v. In https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html I believe that v is simply a matrix with columns storing the eigen/principal components. It is possible to use dask.array format for v, but that requires modifying https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html in the same flavor as in tsqr in https://github.com/dask/dask/blob/master/dask/array/linalg.py

There is a similar implementation of LOBPCG in Fortran-90 in ABINIT, see https://docs.abinit.org/variables/dev/#wfoptalg, and to my recollection in C in SLEPc and Trilinos, and in C++ in MAGMA.

I hope that this answers your question about "blockwise" LOBPCG.

mrocklin commented 6 years ago

@lobpcg this sounds great. If you have a chance to experiment with the LOBPCG algorithm with dask array and can show some performance numbers then that would be very helpful!

lobpcg commented 4 years ago

https://ml.dask.org/modules/generated/dask_ml.cluster.SpectralClustering.html mentions AMG solver which is LOBPCG with extra preconditioning. But at https://github.com/dask/dask-ml/blob/master/dask_ml/cluster/spectral.py, LOBPCG is apparently still not implemented in dask.