Linear Algebra design overview

kgryte commented 3 years ago

The intent of this issue is to provide a bird's eye view of linear algebra APIs in order to extract a consistent set of design principles for current and future spec evolution.

Unary APIs

matrix_rank
qr
pinv
trace
transpose
norm
inv
det
diagonal
svd
matrix_power
slogdet
cholesky

Binary APIs:

vecdot
tensordot
matmul
lstsq
solve
outer
cross

Support stacks (batching)

Main idea is that, if an operation is explicitly defined in terms of matrices (i.e., 2D arrays), then an API should support stacks of matrices (aka, batching).

Unary:

matrix_rank
qr
pinv
inv
det
svd
matrix_power
slogdet
cholesky
norm (when axis is 2-tuple containing last two dimensions)
trace (when specify last two axes via axis1 and axis2)
diagonal (when specify last two axes via axis1 and axis2)
transpose (when specify an axis permutation in which only the last two axes are swapped)

Binary:

vecdot
matmul
lstsq
solve

No stack (batching) support

Binary:

tensordot
outer (vectors)
cross (vectors)

Support tolerances

matrix_rank (rtol*largest_singular_value)
lstsq (rtol*largest_singular_value)
pinv (rtol*largest_singular_value)

Supported dtypes

Main idea here is that we should avoid undefined/ambiguous behavior. For example, when type promotion rules cannot capture behavior (e.g., if accept int64, but need to return as float64), how would casting work? Based on type promotion rules only addressing same-kind promotion, would be up to the implementation, and thus ill-defined. To ensure defined behavior, if need to return floating-point, require floating-point input.

Numeric:

vecdot: numeric (mults, sums)
tensordot: numeric (mults, sums)
matmul: numeric (mults, sums)
trace: numeric (sums)
cross: numeric (mults, sums)
outer: numeric (mults)

Floating:

matrix_rank: floating
det: floating
qr: floating
lstsq: floating
pinv: floating
solve: floating
norm: floating
inv: floating
svd: floating
slogdet: floating (due to nat log)
cholesky: floating
matrix_power: floating (exponent can be negative)

Any:

transpose: any
diagonal: any

Output values

Array:

vecdot: array
tensordot: array
matmul: array
matrix_rank: array
trace: array
transpose: array
norm: array
outer: array
inv: array
cross: array
det: array
diagonal: array
pinv: array
matrix_power: array
solve: array
cholesky: array

Tuple:

qr: Tuple[ array, array ]
lstsq: Tuple[ array, array, array, array ]
svd: array OR Tuple[ array, array, array ] (based on keyword arg)
- should consider splitting into svd and svdvals (similar to eig/eigvals)
slogdet: Tuple[ array, array ]

Note: only SVD is polymorphic in output (compute_uv keyword)

Reduced output dims

norm: supports keepdims arg
vecdot: no keepdims
matrix_rank: no keepdims
lstsq (rank): no keepdims
trace: no keepdims
det: no keepdims
diagonal: no keepdims

Conclusion: only norm is unique here in allowing the output array rank to remain the same as that of the input array.

Broadcasting

vecdot: yes
tensordot: yes
matmul: yes
lstsq: yes (first ndims-1 dimensions)
solve: yes (first ndims-1 dimensions)
pinv: yes (rtol)
matrix_rank: yes (rtol)
outer: no (1d vectors)
cross: no (same shape)

Specialized behavior

cholesky: upper
svd: compute_uv and full_matrices
norm: ord and keepdims
qr: mode
tensordot: axes

shoyer commented 3 years ago

Is there a reason for omitting eigenvalue solvers? Those are a quite fundamental operation for linear algebra.

More generally, I would be curious about any general reasoning behind what should and should not be included here. For example, matrix_rank in particular seems rather esoteric to me (and quite easy to compute from SVD if desired).

shoyer commented 3 years ago

As a reference point, I tried searching for usage of these functions from NumPy in Google's internal codebase. Most are used hundreds of times. The notable exceptions are matrix_rank and matrix_power, which have only a handful of uses. I suggest these are probably not worth standardizing -- unless we're also planning to cover the much broader scope of scipy.linalg.

kgryte commented 3 years ago

@shoyer eigenvalue APIs are currently blocked by lack of complex number support.

shoyer commented 3 years ago

@shoyer eigenvalue APIs are currently blocked by lack of complex number support.

Ah, gotcha. ~~In that case we'll need to omit SVD as well, unless we only care about the diagonal terms (compute_uv=False).~~ (nevermind, this is wrong!)

kgryte commented 3 years ago

@shoyer As for matrix_rank and matrix_power, these are implemented across almost all considered array libraries. Based on downstream usage data of select downstream libraries, matrix_rank, e.g., is more commonly used than cholesky and lstsq.

kgryte commented 3 years ago

Re: svd. ~~This is true.~~ There has been some discussion on splitting the svd API into svd and svdvals (see comment), which would mirror eig and eigvals conventions.

leofang commented 3 years ago

For real matrices, svd's u and v are real, which can be thought as m- and n- dimensional rotational matrices.

kgryte commented 3 years ago

I should also mention that the above overview is reflective of the current list of both specified APIs and APIs which are currently up for review. Meaning, it reflects the current state of linear algebra APIs as included in the standard, so no curation, per se--more just trying to unify how we currently design APIs now and moving forward.

shoyer commented 3 years ago

@shoyer As for matrix_rank and matrix_power, these are implemented across almost all considered array libraries. Based on downstream usage data of select downstream libraries, matrix_rank, e.g., is more commonly used than cholesky and lstsq.

To push back here a little bit -- I think the numbers from downstream libraries reported in https://github.com/data-apis/python-record-api/blob/master/data/typing/numpy.linalg.py are not necessarily indicative of usage for linear algebra functions, which tend to be rather specialized.

Google's internal usage is certainly not representative of all usecases, but it's probably a bit more reflective of what numerical programmers use on average, across hundreds of different projects in different application domains.

In particular:

matrix_power is apparently called 16 times by the SciPy test suite (and twice by statsmodels), but really only inside the implementation/tests for fractional_matrix_power and expm. This supports the premise that matrix_power is needed scipy.linalg, but doesn't really support its stand-alone usage.
I'm not quite sure what to make of the heavy usage of matrix_rank by statsmodels. I guess statistics is unusual in that statisticians really do care about a discrete count of degrees of freedom?

All that said, I'm not really strongly opposed here. These functions are quite easy to implement in terms of other primitives already in the standard. I just don't think they're particularly worthy of inclusion.

shoyer commented 3 years ago

@shoyer eigenvalue APIs are currently blocked by lack of complex number support.

We could potentially follow the example of SVD and only implement eigh/eigvalsh. On real-valued matrices, the result of these functions is also real-valued.

rgommers commented 3 years ago

All that said, I'm not really strongly opposed here. These functions are quite easy to implement in terms of other primitives already in the standard. I just don't think they're particularly worthy of inclusion.

I tend to agree that they're not the most important functions. We had a discussion some months ago about where to draw the line, and the conclusion was that several people had a preference of including everything that's in numpy.linalg. With the rationale that while some functions may not be heavily used, if you need them you really need them and they're often difficult to implement.

shoyer commented 3 years ago

det: numeric (mults, sums)

I would suggest that np.linalg.det should be considered "floating" rather than "numerics", because it can't be implemented efficiently without a matrix factorization. For example, NumPy always casts the inputs to det into floating point values.

jakirkham commented 3 years ago

From the Dask perspective, things like BLAS operations (dot, matmul, etc.) should be fine and exist today. LAPACK operations can be a bit more hit-or-miss in terms of whether they can be implemented and whether that implementation would be performant

oleksandr-pavlyk commented 3 years ago

Regarding the SVD discussion, svd and svdvals may not entirely cut it.

Sometimes only U, or only Vt matrix may be needed, e.g. when computing a PCA transform.

LAPACK does allow to compute only one of the U/Vt matrices, ?GESVD does not require that both JOBV and JOBU parameter be 'N'.

kgryte commented 3 years ago

@shoyer Updated det to be restricted to floating-point data types. Thanks!

kgryte commented 3 years ago

@oleksandr-pavlyk I think for design consistency it makes sense to mirror eig and eigvals. While only returning one of either U or Vt may be desired for certain use cases, such as PCA, if an array library wants to support that optimized use case, it is free to provide a separate API for doing so. However, such an API is not supported in any PyData array library API atm; this is in contrast to svd and svdvals where returning either none or both is supported via the compute_uv keyword.

So, I think the order here would be to first standardize svd and svdvals. If array libraries all start supporting returning only one of U or Vt, then we'd work toward standardizing those optimized/specialized interfaces at that time.

lezcano commented 3 years ago

A note on the importance of eigvals / eigvalsh / svdvals:

These are functions that are always well-defined, as opposed to the full factorisations for which some choices have to be made (if v is an eigenvalues so is -v, if there are repeated eigenvalues / singular values we may rotate any basis of the eigenspace and get a different valid eigendecomposition...).

Edit The following paragraph is wrong as stated. The (ordered) eigenvalues are not differentiable at points with repeated eigenvalues, but simply continuous (and also 1-Lipschitz continuous wrt. the operator 2-norm) see this MO answer. This is still a large improvement over the eigenvectors, which are not even locally uniquely defined in the presence of repeated eigenvalues.

In the context of differentiation (that many of the libraries support), the differential of {eig,eigh,svd}vals is always well-defined. On the other hand, the full factorisations is not differentiable (the elements of the differential tend to infinity) at points with repeated eigen / singular values. As such, these functions that just return the spectrum of a matrix can be understood as safer options than the full factorisations, and certainly preferable when possible.

kgryte commented 3 years ago

Closing this issue, as this overview is now outdated, given subsequent modifications to the specification. Further discussion and refinement can be done in separate issues/PRs.

data-apis / array-api