About matrix scaling in DPOSVX

Reference-LAPACK / lapack

LAPACK development repository

Other

1.5k stars 437 forks source link

About matrix scaling in DPOSVX #1061

Open jgpallero opened 2 weeks ago

jgpallero commented 2 weeks ago

In the DSPOVX subroutine the symmetric positive definite matrix A is scaled if (among other criteria) the ratio between the smallest and the largest element of the main diagonal is min(A(i,i))/max(A(i,i))<0.1.

It is known (Wilkinson, Björck, Higham) that the Cholesky factorization can be computed without breackdown if 2*n^(3/2)*u*k(A)<0.1, where n are the matrix dimensions, u is the unit roundoff, and k(A)=max(sigma)/min(sigma) is the condition number of the matrix and sigmaare the matrix singular values.

I understand that min(A(i,i)) and max(A(i,i)) is a cheap way for computing a roughly value of singular values of the matrix, and, then, its condition number (in fact a lower bound). But what is the reason for omit the 2*n^(3/2)*u factor in the criterion? For example, the matrix

A=[1   1
   1  11];

will be scaled according to the DPOSVX criterion as 1/11=0.091<0.1, but using 2*n^(3/2)*u*k(A)<0.1 we obtain 7.7e-15<0.1, which clearly shows that the Cholesky decomposition is clearly safe.

On the other hand, scaling by a diagonal matrix as DPOSVX is O(n^2), so there is not a penalty in the subroutine as Cholesky is O(n^3). But I would like the reason of min(A(i,i))/max(A(i,i))<0.1.

Thanks

langou commented 2 weeks ago

We want to solve Ax = b where A symmetric positive definite and b is a right-hand side.

When equilibration is enabled, DPOSVX computes diagonal matrix S such that S(i,i) = 1 / sqrt(A(i,i)) and then compute B = SAS. This process is called "equilibration". And then, instead of solving Ax=b, DPOSVX solve By = Sb. And then DPOSVX gets the solution x with x = Sy.

When equilibration is enabled, DPOSVX works with B = SAS instead of A because the condition number of B is smaller than the condition number of A. The main reason for equilibrating is to work with B so as to have a better condition system of equations (than working with A).

No equilibration is done if min(A(i,i))/max(A(i,i))>=0.1 because, see comments in DPOEQU, "if min(A(i,i))/max(A(i,i)) >= 0.1 and AMAX is neither too large nor too small, it is not worth scaling by D". In other words, the matrix A is deemed well equilibrated enough.

jgpallero commented 1 week ago

Thak you for your answer. I had already seen the DPOEQU help text, but my question was about why the value of 0.1 and why the min(A(i,i))/max(A(i,i)) criterion.

If the Cholesky decomposition is safe when 2*n^(3/2)*u*k(A)<0.1, where n are the matrix dimensions, u is the unit roundoff, and k(A)=max(sigma)/min(sigma) is the condition number of the matrix and sigma are the matrix singular values, why do not use this criterion for the scaling decision? One reason is that k(A) is computationally expensive, and min(A(i,i))/max(A(i,i)) is an upper bound for 1/k(A), but in the criterion min(A(i,i))/max(A(i,i))<0.1 the values n and u do not appear.

langou commented 1 week ago

my question was about why the value of 0.1 and why the min(A(i,i))/max(A(i,i)) criterion.

The goal of equilibration is to have min(A(i,i))/max(A(i,i)) = 1. Equilibration "equilibrates" all diagonal entries to be 1. Here 1 is arbitrary. It could be whatever. The point is that the whole diagonal is the same number. So an "equilibrated matrix" is such that min(A(i,i))/max(A(i,i)) = 1.

If min(A(i,i))/max(A(i,i)) >= 0.1, then we decide that the matrix A is well equilibrated enough and we do not need to worry about equilibrating it. So we skip equilibration when min(A(i,i))/max(A(i,i)) >= 0.1.

langou commented 1 week ago

(*) A. van der Sluis (1969) “Condition numbers and equilibration of matrices” in Numer. Math., 14 (1):14–23, showed that the algorithm to compute D leads to nearly optimal equilibration in the sense that the D that gets 1 on the diagonal leads to nearly optimal condition number out of of all diagonal matrices.

(*) Jim Demmel (1989) “On Floating Point Errors in Cholesky”, LAPACK Working Note 14, https://www.netlib.org/lapack/lawnspdf/lawn14.pdf studies shows that Cholesky does not really need equilibration. While the condition number is reduced (and dramatically so at times), the quality of the solution returned by Cholesky with or without equilibration is the same.

(*) There are two things that I do not really understand. (1) Demmel explains that equilibration is not needed in Cholesky which begs the question: why do we use equilibration in Cholesky? One reason to use equilibration in DSPOSVX might be that it might improve the iterative refinement phase. I do not know. (2) Also, I am not sure why equilibration is not done with “power of 2’s” so as to perform exact computations.

(*) Note that extension of van der Sluis work for block case can be found at (1) Heike Faßbender, Miroslav Rozložník, Sanja Singer (2021) "Nearly optimal scaling in the SR decomposition", Linear Algebra and its Applications, Pages 295-319 https://doi.org/10.1016/j.laa.2020.11.011 and (2) Demmel (2023) "Nearly Optimal Block-Jacobi Preconditioning" SIAM J. Mat. Anal. Appl., v. 44, n. 1, pp 408-413, https://doi.org/10.1137/22M1504901

jgpallero commented 1 week ago

Julien, thank you very much for the references. I really appreciate them.

About the need of scaling in this problem, I always though that the scaling was related to the prevention of Cholesky factorization breakdown. It is well known (Golub and Van Loan, Björck, etc.) that Cholesky factorization can be computed without breakdown if 2*n^(3/2)*u*k(A)<0.1, where n are the matrix dimensions, u is the unit roundoff, and k(A)=max(sigma)/min(sigma) is the condition number of the matrix and sigma are the matrix singular values. So I though that scaling were necessary if 2*n^(3/2)*u*k(A) was near 0.1. But obviously computing the condition number of the matrix is more expensive that the Cholesky factorization itself.

According to Demmel, In this paper we show that this scaling of H is unnecessary because the standard unscaled Cholesky algorithm satisfies bound (2). Therefore, nothing is gained by scaling. My question in this case is, if the condition number of a matrix makes impossible to perform Cholesky decomposition the solution of the system is impossible. But if I scale this matrix the condition number will be diminished and i can perform Cholesky and solve the system. Then I undo the scaling in the solution vector. The question is if this solution is reliable.

In any case, my original question about the scaling in DPOSVX was related to the fact that the criterion is min(A(i,i))/max(A(i,i))<0.1, whithout considering matrix dimensions nor unit roundoff. Also, as scaling in this case is a cheap operation, it is any penalty to apply it in whatever case