Closed lezcano closed 3 years ago
Thanks for the proposal @Lezcano. It sounds sensible, it would be good to get some confirmation that this is the right thing to do. One way to go about that would be to check some of the svd
usages in SciPy. Over 50% of those do not use full_matrices
at all, which is likely due to the authors just going with the default. If you pick, say, 3 or 5 examples from SciPy and show that they get more efficient if adding full_matrices=False
without introducing algorithmic/accuracy issues, then that'd be clear evidence that changing to the more efficient default makes sense.
That makes sense. I'll show a few examples here going in the order as they appear when you look for linalg.svd
in the SciPy repo together with one from the svd tests. I will omit those where the compute_uv=False
is used, as for those the flag full_matrices
does nothing.
test_svds.py L19. Called from a number of places in the file.
It performs an SVD and then orders the singular values (and the singular vectors) increasingly or decreasingly. In every call within the file, the code makes sure that the k
passed is less than k = min(m, n)
for a matrix of size m x n
. In other words, full_matrices=False
should have been used here.
Trust region methods. The SVD is used with full_matrices=False
.
Test multivariate normal. The SVD is used without full_matrices
to generate a random orthogonal matrix. This is not a good use of svd
, it should be replaced by a qr
. The reason is that it is possible to prove that it a matrix has entries distributed as a standard normal, the Q
of its qr
decomposition is distributed as the Haar measure on the orthogonal group. Also, qr
is much faster than svd
. All this is implemented carefully and efficiently in stats.ortho_group
.
K-means clustering. The SVD is used with full_matrices=False
.
test_basic.py. It is used without parameters but on a square matrix. As such, the behaviour with full_matrices=False
would be the same.
rotation.py It is used without parameters but on a square matrix. As such, the behaviour with full_matrices=False
would be the same.
Summary: These first examples show that in all the cases using full_matrices=False
is either beneficial or a noop or shows some problems with the code. The most problematic point is the third one, where one might find examples in which the user did intend to use full_matrices=True
(for correct reasons or not) and changing the default would break code. That being said, it is clear that the full_matrices=False
default is a saner one from an API perspective.
Thanks @Lezcano! That's convincing to me, +1 for changing the default. And I guess it's time to open a SciPy PR:)
This is not accurate as in the mistake is that the 'reduced' part is about numpy.linalg.qr
.
In scipy.linalg.qr and scipy.linalg.svd
all return the full output by default and there is no inconsistency. This is in line with dense usage since most of the use cases require full output rather than the economy output.
I agree, when I've used svd I cannot think of a case where I did not set full_matrices=False
.
@ilayn Can you elaborate a bit more on "most of the use cases require full output"? Which use cases do you have in mind?
Like I mentioned almost all. From signal processing to control theory, identification, more esoteric matrix decompositions we always use full matrices. Other than low rank approximations and least squares problems I don't encounter economy mode of svds since often thin svd is comparable to QR to change the basis. Compact SVD is also a low-rank approximation but you can also use iterative methods with better performance for sizes of matrices that would make any difference.
For reduced QR that's a mixed bag since depending on the matrix shape you can get both as useful as possible however that doesn't grant a ticket for changing the default to thin qr. Julia stopped the economy mode in 0.7ish for defaults I think, matlab also doesn't do thin or reduced by default for SVD or QR.
So is the case for Mathematica for SVD but returns thin for QR but I haven't checked since years. Thus the only two I know of that doesn't fit the rest is numpy qr and mathematica qr which are strange. I wasn't around then so I don't know why it is selected.
I agree with @shoyer. I have never had the need to use full_matrices=True
.
In the codebase of SciPy, I just found one use of full_matrices=True
that was needed, while there were plenty of uses of full_matrices=False
.
In the field that I know best, that of matrix analysis and geometry, thin SVD and reduced QR are often used to simplify certain computations to smaller matrices. For example, when you compute the geodesics on the manifold of matrices with orthonormal columns (the Stiefel manifold) St(n, p)
with n > p
, you can do so via first doing a thin QR decomposition and simplifying the computation of a matrix exponential from a matrix of size 2n x 2n
to one of size 2p x 2p
(see eq. 2.45-2.47 in the paper). Same for the Grassmannian via a compact SVD (see Theorem 2.3). In the same article, it is shown how to implement a form of conjugate gradient descent, also via the thin SVD.
In a similar way, a reduced SVD may be used to compute the polar decomposition of a matrix. More generally, it allows for computing the projection of a matrix onto the Stiefel manifold or low-rank approximations (as @ilayn mentioned) via Eckart–Young–Mirsky. While iterative methods based on Lanczos can be more efficient at times, this might not be always the case. For example, on GPUs, iterative methods are incredibly slow, but rather fast GPU implementations of the SVD exist. Given that a number of the frameworks that will hopefully follow the array API support GPU, this is certainly something to keep in mind.
In short, I have encountered the thin SVD in many different applications as a building block for other algorithms, and in all the applications that I have come across, the thin SVD was used. That being said, my experience is certainly biased towards the fields I have studied. As such, it would be helpful if other people could share examples on which the full SVD is useful, as I do not know of any in my field, and there was just one in the SciPy code base.
I agree, when I've used svd I cannot think of a case where I did not set
full_matrices=False
.I agree with @shoyer. I have never had the need to use
full_matrices=False
.
Pardon my poor English, but aren't these two statements contradictory? 😂
Edited :)
Based on consortium meeting minutes from 2021-07-29, it was decided to keep the current default of full_matrices
for SVD given the lack of consensus regarding whether full output is more generally preferred. Given that this retains the status quo, users will have to opt-in to return a reduced SVD. This is similar to other environments (e.g., MATLAB).
My 2 cents on this. If your input is super wide/tall, then having full_matrices=True
is inefficient from the memory point of view. If, further, a backprop is required, only min(m, n)
size subspaces are used for grad computations. But since these subspaces are views on those potentially huge subspaces, these bases have to be stored for both the backward and forward.
I personally find the reduced variant cleaner because we get orthogonal bases for the row and column subspaces without the need of narrowing it down if, for example, we would like to get a projection onto the range/complement subspaces.
Yeah we don't even have full_matrices=True
as an option in Dask currently. It is implemented as full_matrices=False
. Though we do have an open issue about adding this ( https://github.com/dask/dask/issues/3576 )
Problem The current default behaviour of
linalg.svd
computes the full decomposition by settingfull_matrices=True
by default. On the other hand, the default behaviour oflinalg.qr
specifiesmode="reduced"
by default.Proposal Change de default to
full_matrices=False
inlinalg.svd
. In most cases the reduced SVD is good enough. Even more, it’s better to do less work by default and just make the more expensive operation opt-in.cc @rgommers @mruberry