data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
215 stars 46 forks source link

Linear Algebra design overview #147

Closed kgryte closed 3 years ago

kgryte commented 3 years ago

The intent of this issue is to provide a bird's eye view of linear algebra APIs in order to extract a consistent set of design principles for current and future spec evolution.

Unary APIs

Binary APIs:

Support stacks (batching)

Main idea is that, if an operation is explicitly defined in terms of matrices (i.e., 2D arrays), then an API should support stacks of matrices (aka, batching).

Unary:

Binary:

No stack (batching) support

Binary:

Support tolerances

Supported dtypes

Main idea here is that we should avoid undefined/ambiguous behavior. For example, when type promotion rules cannot capture behavior (e.g., if accept int64, but need to return as float64), how would casting work? Based on type promotion rules only addressing same-kind promotion, would be up to the implementation, and thus ill-defined. To ensure defined behavior, if need to return floating-point, require floating-point input.

Numeric:

Floating:

Any:

Output values

Array:

Tuple:

Note: only SVD is polymorphic in output (compute_uv keyword)

Reduced output dims

Conclusion: only norm is unique here in allowing the output array rank to remain the same as that of the input array.

Broadcasting

Specialized behavior

shoyer commented 3 years ago

Is there a reason for omitting eigenvalue solvers? Those are a quite fundamental operation for linear algebra.

More generally, I would be curious about any general reasoning behind what should and should not be included here. For example, matrix_rank in particular seems rather esoteric to me (and quite easy to compute from SVD if desired).

shoyer commented 3 years ago

As a reference point, I tried searching for usage of these functions from NumPy in Google's internal codebase. Most are used hundreds of times. The notable exceptions are matrix_rank and matrix_power, which have only a handful of uses. I suggest these are probably not worth standardizing -- unless we're also planning to cover the much broader scope of scipy.linalg.

kgryte commented 3 years ago

@shoyer eigenvalue APIs are currently blocked by lack of complex number support.

shoyer commented 3 years ago

@shoyer eigenvalue APIs are currently blocked by lack of complex number support.

Ah, gotcha. In that case we'll need to omit SVD as well, unless we only care about the diagonal terms (compute_uv=False). (nevermind, this is wrong!)

kgryte commented 3 years ago

@shoyer As for matrix_rank and matrix_power, these are implemented across almost all considered array libraries. Based on downstream usage data of select downstream libraries, matrix_rank, e.g., is more commonly used than cholesky and lstsq.

kgryte commented 3 years ago

Re: svd. This is true. There has been some discussion on splitting the svd API into svd and svdvals (see comment), which would mirror eig and eigvals conventions.

leofang commented 3 years ago

For real matrices, svd's u and v are real, which can be thought as m- and n- dimensional rotational matrices.

kgryte commented 3 years ago

I should also mention that the above overview is reflective of the current list of both specified APIs and APIs which are currently up for review. Meaning, it reflects the current state of linear algebra APIs as included in the standard, so no curation, per se--more just trying to unify how we currently design APIs now and moving forward.

shoyer commented 3 years ago

@shoyer As for matrix_rank and matrix_power, these are implemented across almost all considered array libraries. Based on downstream usage data of select downstream libraries, matrix_rank, e.g., is more commonly used than cholesky and lstsq.

To push back here a little bit -- I think the numbers from downstream libraries reported in https://github.com/data-apis/python-record-api/blob/master/data/typing/numpy.linalg.py are not necessarily indicative of usage for linear algebra functions, which tend to be rather specialized.

Google's internal usage is certainly not representative of all usecases, but it's probably a bit more reflective of what numerical programmers use on average, across hundreds of different projects in different application domains.

In particular:

All that said, I'm not really strongly opposed here. These functions are quite easy to implement in terms of other primitives already in the standard. I just don't think they're particularly worthy of inclusion.

shoyer commented 3 years ago

@shoyer eigenvalue APIs are currently blocked by lack of complex number support.

We could potentially follow the example of SVD and only implement eigh/eigvalsh. On real-valued matrices, the result of these functions is also real-valued.

rgommers commented 3 years ago

All that said, I'm not really strongly opposed here. These functions are quite easy to implement in terms of other primitives already in the standard. I just don't think they're particularly worthy of inclusion.

I tend to agree that they're not the most important functions. We had a discussion some months ago about where to draw the line, and the conclusion was that several people had a preference of including everything that's in numpy.linalg. With the rationale that while some functions may not be heavily used, if you need them you really need them and they're often difficult to implement.

shoyer commented 3 years ago
  • det: numeric (mults, sums)

I would suggest that np.linalg.det should be considered "floating" rather than "numerics", because it can't be implemented efficiently without a matrix factorization. For example, NumPy always casts the inputs to det into floating point values.

jakirkham commented 3 years ago

From the Dask perspective, things like BLAS operations (dot, matmul, etc.) should be fine and exist today. LAPACK operations can be a bit more hit-or-miss in terms of whether they can be implemented and whether that implementation would be performant

oleksandr-pavlyk commented 3 years ago

Regarding the SVD discussion, svd and svdvals may not entirely cut it.

Sometimes only U, or only Vt matrix may be needed, e.g. when computing a PCA transform.

LAPACK does allow to compute only one of the U/Vt matrices, ?GESVD does not require that both JOBV and JOBU parameter be 'N'.

kgryte commented 3 years ago

@shoyer Updated det to be restricted to floating-point data types. Thanks!

kgryte commented 3 years ago

@oleksandr-pavlyk I think for design consistency it makes sense to mirror eig and eigvals. While only returning one of either U or Vt may be desired for certain use cases, such as PCA, if an array library wants to support that optimized use case, it is free to provide a separate API for doing so. However, such an API is not supported in any PyData array library API atm; this is in contrast to svd and svdvals where returning either none or both is supported via the compute_uv keyword.

So, I think the order here would be to first standardize svd and svdvals. If array libraries all start supporting returning only one of U or Vt, then we'd work toward standardizing those optimized/specialized interfaces at that time.

lezcano commented 3 years ago

A note on the importance of eigvals / eigvalsh / svdvals:

These are functions that are always well-defined, as opposed to the full factorisations for which some choices have to be made (if v is an eigenvalues so is -v, if there are repeated eigenvalues / singular values we may rotate any basis of the eigenspace and get a different valid eigendecomposition...).

Edit The following paragraph is wrong as stated. The (ordered) eigenvalues are not differentiable at points with repeated eigenvalues, but simply continuous (and also 1-Lipschitz continuous wrt. the operator 2-norm) see this MO answer. This is still a large improvement over the eigenvectors, which are not even locally uniquely defined in the presence of repeated eigenvalues.

In the context of differentiation (that many of the libraries support), the differential of {eig,eigh,svd}vals is always well-defined. On the other hand, the full factorisations is not differentiable (the elements of the differential tend to infinity) at points with repeated eigen / singular values. As such, these functions that just return the spectrum of a matrix can be understood as safer options than the full factorisations, and certainly preferable when possible.

kgryte commented 3 years ago

Closing this issue, as this overview is now outdated, given subsequent modifications to the specification. Further discussion and refinement can be done in separate issues/PRs.