chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.77k stars 416 forks source link

Feature completeness of dense local LinearAlgebra #15621

Open ben-albrecht opened 4 years ago

ben-albrecht commented 4 years ago

This is a meta-issue tracking the feature completeness dense local linear algebra. Implementations can be wrapped (BLAS/LAPACK) or native for this checklist. We are using numpy.linalg as a reference for feature completeness.

Matrix and vector products

Features that we may not want to add without further discussion

Decompositions

Matrix eigenvalues

Norms and other numbers

Solving equations and inverting matrices

Future work

Future efforts out of scope of this issue should track:

ben-albrecht commented 4 years ago

After chatting with @npadmana, we agree the higher priority features are:

MohamedKasem99 commented 4 years ago

I'm interested in working on this. I'm also interested in exploring the possibility of repeating the whole wrapping process with cuBLAS pre-compiled binaries.

ben-albrecht commented 4 years ago

@MohamedKasem99 - that's great to hear.

qr and pinv are the next higher priority features to add from this list. Let me know if you begin working on one of them. Beyond that, einsum functions might be a good target.

I have opened a separate issue for CuBLAS discussion: https://github.com/chapel-lang/chapel/issues/15636

npadmana commented 4 years ago

@ben-albrecht -- just to track this here (maybe should be its own issue).

damianmoz commented 4 years ago

I am happy to work on the norm aspects. I am currently working on this topic in my own research. And I have some experience in this area.

Also, I think a big measure of the success of Chapel as a HPC language will be the performance of a native Chapel implementation of linear algebra.

ben-albrecht commented 4 years ago

Thanks @damianmoz. Completely agree.

Are there specific features of Norm implementations you are interested in contributing? The remaining TODO here is to add 2D 2-norm, which I was planning on doing (I have a commented out implementation checked in). That said, I believe there are a few more flavors of Norm that could be added beyond that, compared to scipy.linalg.norm for example.

damianmoz commented 4 years ago

The current implementation of the 2-norm overflows and underflows easily, although it parallelizes nicely. For provenance, I would probably look at Ed Anderson's paper (Algorithm 978) in the 2017 TOMS which harks back to the 1978 paper by Blue. You can find a reference to some internal communications with me in there. There are marginally better choices for Blue's threshold constants, not so much for that algorithm but for generality. I did also note the benefits of Blue's approach to the EIGEN guys and they also included it as an optional extra. When I wrap up my IEEE754 paper, I will send the final implementation there for consideration. Note that @bradcray has an early copy of that which might have a variant of the algorithm in Chapel but without the final square root.

Normally 2-norms are never performance critical so parallelization or vectorization is not a big issue. The overhead to avoid overflow and underflow results in a 1-pass algorithm where the floating point multiplication is a zero-cost operation of modern CPUs. There was a 2-pass algorithm in LAPACK 2E which vectorizes nicely but has double the memory accesses and is slower that the 1-pass algorithm. I have a new 1-pass algorithm which is slightly slower for double the accuracy but it is still very much work-in-progress. Note that the new IEEE 754 standard has a routine for 2D norms. That said, it fails to propagate NaN properly so is useless for any of my finite-element-focused applications.

If by a 2D 2-norm, you mean the hypot math library routine, that is another question.

Yudhishthira1406 commented 3 years ago

@ben-albrecht I would like to implement pinv function using svd Thanks

damianmoz commented 3 years ago

Am I oversimplifying this:

A = U S Vt ..... (1)

where V^t = transpore(V) and

V^V^t = U U^t = U^t U = I

Moore-Penrose says where A+ is the generalized inverse of A or pinv(A):

A A+ A  = A .... (2)

Plug (1) into (2) and you have

U S V^t A+ U S V^t = U S V^t

and then messing with that you get

S V^t A+ U S V^t = S V^t

If you premultiply both sides by the inverse of S and then premultiply by sides by V, you get

A+ U S V^t = I

and postmultiplying by first V and then S^(-1) and then U^t you eventually get:

pinv(A) = A+ = V S^(-1) U

where S^(-1) is a diagonal matrix obtained by taking S and replacing each non-zero singular values by its inverse and leaving the zeros alone. Note that singular values which are negligible relative to the largest singular value should be zero'd out.

So, you just need the svd function and from there, it is an exercise in matrix multiplication.

Hopefully I did not make a typo in the algebra above.

Yudhishthira1406 commented 3 years ago

@damianmoz Yep, I think that's right

pinv(A) = A+ = V S^(-1) U

There is a small typo here. It should be U^t instead of U. Thanks

damianmoz commented 3 years ago

Yes. You are right. Sorry. It was the end of a long day.

ben-albrecht commented 3 years ago

@Yudhishthira1406 - that works for me, thanks for the heads up

ghbrown commented 1 year ago

@ben-albrecht I believe the eigenvalue section is now complete with the merging of the currently linked PR (or at least there aren't outstanding reservations/todos). Can those two be checked off?

ben-albrecht commented 1 year ago

I agree, thanks!