Open ben-albrecht opened 4 years ago
After chatting with @npadmana, we agree the higher priority features are:
lstsq
eigh
eigvalsh
norm
qr
pinv
I'm interested in working on this. I'm also interested in exploring the possibility of repeating the whole wrapping process with cuBLAS pre-compiled binaries.
@MohamedKasem99 - that's great to hear.
qr
and pinv
are the next higher priority features to add from this list. Let me know if you begin working on one of them. Beyond that, einsum
functions might be a good target.
I have opened a separate issue for CuBLAS discussion: https://github.com/chapel-lang/chapel/issues/15636
@ben-albrecht -- just to track this here (maybe should be its own issue).
I am happy to work on the norm aspects. I am currently working on this topic in my own research. And I have some experience in this area.
Also, I think a big measure of the success of Chapel as a HPC language will be the performance of a native Chapel implementation of linear algebra.
Thanks @damianmoz. Completely agree.
Are there specific features of Norm implementations you are interested in contributing? The remaining TODO here is to add 2D 2-norm, which I was planning on doing (I have a commented out implementation checked in). That said, I believe there are a few more flavors of Norm that could be added beyond that, compared to scipy.linalg.norm
for example.
The current implementation of the 2-norm overflows and underflows easily, although it parallelizes nicely. For provenance, I would probably look at Ed Anderson's paper (Algorithm 978) in the 2017 TOMS which harks back to the 1978 paper by Blue. You can find a reference to some internal communications with me in there. There are marginally better choices for Blue's threshold constants, not so much for that algorithm but for generality. I did also note the benefits of Blue's approach to the EIGEN guys and they also included it as an optional extra. When I wrap up my IEEE754 paper, I will send the final implementation there for consideration. Note that @bradcray has an early copy of that which might have a variant of the algorithm in Chapel but without the final square root.
Normally 2-norms are never performance critical so parallelization or vectorization is not a big issue. The overhead to avoid overflow and underflow results in a 1-pass algorithm where the floating point multiplication is a zero-cost operation of modern CPUs. There was a 2-pass algorithm in LAPACK 2E which vectorizes nicely but has double the memory accesses and is slower that the 1-pass algorithm. I have a new 1-pass algorithm which is slightly slower for double the accuracy but it is still very much work-in-progress. Note that the new IEEE 754 standard has a routine for 2D norms. That said, it fails to propagate NaN properly so is useless for any of my finite-element-focused applications.
If by a 2D 2-norm, you mean the hypot math library routine, that is another question.
@ben-albrecht I would like to implement pinv
function using svd
Thanks
Am I oversimplifying this:
A = U S Vt ..... (1)
where V^t = transpore(V) and
V^V^t = U U^t = U^t U = I
Moore-Penrose says where A+ is the generalized inverse of A or pinv(A):
A A+ A = A .... (2)
Plug (1) into (2) and you have
U S V^t A+ U S V^t = U S V^t
and then messing with that you get
S V^t A+ U S V^t = S V^t
If you premultiply both sides by the inverse of S and then premultiply by sides by V, you get
A+ U S V^t = I
and postmultiplying by first V and then S^(-1) and then U^t you eventually get:
pinv(A) = A+ = V S^(-1) U
where S^(-1) is a diagonal matrix obtained by taking S and replacing each non-zero singular values by its inverse and leaving the zeros alone. Note that singular values which are negligible relative to the largest singular value should be zero'd out.
So, you just need the svd
function and from there, it is an exercise in matrix multiplication.
Hopefully I did not make a typo in the algebra above.
@damianmoz Yep, I think that's right
pinv(A) = A+ = V S^(-1) U
There is a small typo here. It should be U^t instead of U. Thanks
Yes. You are right. Sorry. It was the end of a long day.
@Yudhishthira1406 - that works for me, thanks for the heads up
@ben-albrecht I believe the eigenvalue section is now complete with the merging of the currently linked PR (or at least there aren't outstanding reservations/todos). Can those two be checked off?
I agree, thanks!
This is a meta-issue tracking the feature completeness dense local linear algebra. Implementations can be wrapped (BLAS/LAPACK) or native for this checklist. We are using
numpy.linalg
as a reference for feature completeness.Matrix and vector products
dot(a, b[, out])
inner(a, b)
outer(a, b[, out])
matmul(x1, x2, /[, out, casting, order, …])
tensordot(a, b[, axes])
einsum(subscripts, *operands[, out, dtype, …])
einsum_path(subscripts, *operands[, optimize])
linalg.matrix_power(a, n)
kron(a, b)
Features that we may not want to add without further discussion
linalg.multi_dot(arrays)
vdot(a, b)
Decompositions
linalg.cholesky(a)
linalg.qr(a[, mode])
linalg.svd(a[, full_matrices, compute_uv, …])
Matrix eigenvalues
linalg.eig(a)
linalg.eigh(a[, UPLO])
syever
andheevr
linalg.eigvals(a)
linalg.eigvalsh(a[, UPLO])
syever
andheevr
Norms and other numbers
linalg.norm(x[, ord, axis, keepdims])
distributions/robust/arithmetic/modules/test_module_Norm.chpl
) to LinearAlgebra testinglinalg.cond(x[, p])
linalg.det(a)
linalg.matrix_rank(M[, tol, hermitian])
linalg.slogdet(a)
getrf
trace(a[, offset, axis1, axis2, dtype, out])
Solving equations and inverting matrices
linalg.solve(a, b)
linalg.tensorsolve(a, b[, axes])
linalg.lstsq(a, b[, rcond])
gelsd
and some specialized variantslinalg.inv(a)
linalg.pinv(a[, rcond, hermitian])
svd
linalg.tensorinv(a[, ind])
Future work
Future efforts out of scope of this issue should track:
scipy.linalg