Open christoph-conrads opened 1 year ago
Hi @christoph-conrads, would SRC/srscl.f do the job too? @langou
I did some digging and the issue with LARFGP are mentioned in https://www.netlib.org/lapack/bug_list.html#_strong_span_class_green_bug0037_span_strong_xlarfp_and_scaling which reads:
bug0037 :: xLARFP and scaling CORRECTED - see svn 754, 755 bug report sent by Pat Quillen on Wed 29 Jul 2009 to "lapack@cs.utk.edu". "xLARFP and scaling"
- see Pat Quillen’s email, and quillen dlarfp2.f,
- see Sven Hammarling’s email
- see Pat Quillen’s email, quillen dlarfp3.f, and quillen dlarf2.f.
- committed by Julie Langou (SVN revision 695, on Fri Jun 18 2010)
And then
bugXXXX Revert DLARFG to 3.1.0, move DLARFG to DLARFP and improved DLARFP CORRECTED - see svn (r706, r707, r708, r710, r712, r713) + (r755) Bug reported by Pat Quillen (MathWorks) The Householder reflector generation routine dlarfg (for double precision) has been changed in LAPACK-3.2 by dlarfp to generate nonnegative diagonal elements (on the diagonal of the R factor of a QR factorization). There is an issue in the dlarfp subroutine in LAPACK-3.2. The bug report is from Pat Quillen from MathWorks, have participated in the bug fix: Pat Quillen, Jim Demmel, Sven Hammarling, Mark Hoemmen, Guillaume Revy, William Kahan, Julie Langou and Julien Langou. Extensive work by Jim Demmel, Guillaume Revy, and William Kahan have revealed that the new Householder reflector that guarantees a positive diagonal of R in QR has two apparently unfixable drawbacks compared to the old reflector: 1) the computed reflector can be less orthogonal by a factor of about 4, (Jim, Guillaume and William are not sure why.) 2) this new routine is intrinsically more sensitive to over/underflow, this is now well understood, details are skipped here. None of these issues arise with the conventional Householder reflector. So the decision is to
- Fix as much as possible the xLARFP routines. This was corrected by Julien (see (r706, r707, r708, r710, r712, r713)) based on Jim's idea.
- Back out the changes in LAPACK to go back to using the old reflector (xlarfg) everywhere (e.g. in xGEQRF, xGEQR2, etc.).
- Keep the change that saves flops when u has lots of trailing zeros, since this is independently a good idea.
- Change the name from xLARFP to xLARFGP and introduce new QR routines (xGEQRFP and xGEQR2P) that uses xlarfp to guarantee a new positive diagonal of R, for who needs it.
- This requires to adapt the test suites
- Note: only the QR routines have an xGEQRFP. While it seems that quite a few users care for QR factorization with nonnegative diagonal R. Noone seemed to care for this feature on LQ, QL or RQ. So we have left out these routines.
- Steps 2 to 6 were done during r755 by Julie.
Thank you for providing this reference. Apparently the changes to xLARFGP are on target whereas the changes to xORBDB5 and xORBDB6 from #930 are just working around the problem.
2) this new routine is intrinsically more sensitive to over/underflow, this is now well understood, details are skipped here.
After reading the linked messages, reading §2 in LAWN 203 where xLARFGP is introduced, and reviewing my test data, I claim that the problem is caused by the current implementation handling only some of the possible under- and overflows.
For example, in the first message in this issue the expression ONE / ALPHA
is observed to overflow. The suggested fix SLASCL( CFROM=ALPHA, CTO=ONE, X )
does not work because ALPHA
is subnormal in my test case; the computed vector entries are off by factor two. The problem could be avoided by computing SLASCL( CFROM=XNORM, CTO=-ALPHA/XNORM, X )
and SLASCL( CFROM=ALPHA, CTO=ONE, X )
, respectively, depending on which branch is taken in line 192.
To resolve this issue, I will go through the xLARFGP code line by line and check if one can trigger over- or underflows. Luckily dimension-two inputs suffice to trigger these problems.
On the upside I previously claimed that SLASCL
can replace calls to SSCAL
only if INCX = 1
. This is not true:
LDA
to INCX
.Computations with Householder reflectors by xLARFGP are only conditionally backward stable. The problem is not the computation of the reflector itself but the entries large in modulus it can contain. Those large entries in turn cause the loss of significant digits when the reflector is applied (e.g., with xLARF). Consequently, the problems with xLARFGP are not necessarily evident by looking only at the xLARFGP output.
The text is structured as follows:
tl;Dr Householder reflectors computed by xLARFGP can contain values with very large modulus when x(1) > 0
and x(1) ≈ ‖x‖
. This can cause overflows during the reflector computation or later a loss of accuracy when the reflector is used by.
Given a vector x
, xLARFGP computes a vector v
subject to v(1) = 1
and a scalar τ such that v
is an elementary Householder reflector with I - τ v v^* x = β e₁
, where β = ‖x‖
. In contrast to xLARFG (note the missing P in the subroutine name) β is constrained to be nonnegative.
Consider the following matrix in single precision:
A = [10/11 ε⁴, 1]
[12/10 ε⁵, 0]
This example was derived from the failing test that led to the creation of this issue. The constants 10/11 and 12/10 are tuned values from the test data; their most important properties are
|A(2,1)| > ε |A(1,1)|
to avoid triggering a zero check in SLARFGP added in #930 (see slarfgp.f
).The QR decomposition computed by SGEQRF with SGEQR2 patched to use SLARFGP is as follows:
+1.8358945e-28 -1.0000000e+00
-1.3981013e+07 +1.4310353e-07
Note the very large value in modulus in the first column; this is the v(2)
entry of the first reflector. The Householder matrix assembled by SORGQR from this data looks as follows:
+1.0000000e+00 -1.4310353e-07
+1.4310353e-07 +1.0007324e+00
There are two things of note in this matrix that should be orthogonal:
v(2)
entry looks fine and its norm is within the acceptable bounds of rounding errors.The orthogonal matrix computed by SGEQRF, SORGQR, and SLARFGP is inaccurate.
The computation of Householder matrices is in general known to be backward stable. This section will discuss which assumptions made in the numerical linear algebra literature are not met by a Householder reflector computed with a nonnegative β. Specifically, I will refer to the proofs in "Accuracy and Stability of Numerical Algorithms", 2nd Edition, 2002, by Nicholas J. Higham (ASNA for short) in §19.3 "Error Analysis of Householder Computations".
There exist three points of divergence when ε must be nonnegative.
v
is assumed to be √2 (see the paragraph after Lemma 19.1) whereas the LAPACK implementation constructs a reflector with v(1) = 1
.v
is applied as (√τ v)(√τ v)^*
whereas SLARF computes y - τ (v (v^* y))
.The norm of the reflector is significant in Lemma 19.2; this lemma proves a small backward error when (I - τ v v^*) y
is computed for an arbitrary vector y
.
Let us first consider the case of xLARFG:
1 ≤ τ ≤ 2
. Thus, thus the computation of y - τ (v (v^* y))
by SLARF should not deliver results that differ significantly from the computation y - (√τ v)(√τ v)^* y
in ASNA.In summary, there is a close match between the assumptions in ASNA and the actual LAPACK implementation.
Next, we examine the behavior with xLARFPG when x(1) is positive (if x(1) is negative, then it behaves identically to xLARFG and is numerically backward stable):
v
is nearly unconstrained because x(1) - ‖x‖
can be very small. Thus the result of v(i) = x(i) / (x(1) - ‖x‖)
can be very large in modulus. This implies large intermediate values in computations and a backward error that is much larger than the one calculated in ASNA Lemma 19.2.x - τ (v (v^* x))
and the modulus τ is nearly unconstrained again.In summary, the proofs concerning the backward stability of Householder matrix-related calculations in ASNA cannot be used when reflectors are computed by xLARFGP and x(1) > 0
and x(1) ≈ ‖x‖
.
In general, I do not see how to fix xLARFGP. Possible work-arounds are:
The Python program below computes the QR decomposition of a 2x2 matrix and explicitly forms the Householder matrix Afterwards. The purpose of the program is to be able to play with the values in the first matrix column after replacing the call to SLARFG with a call to SLARFGP in SGEQR2.
Usage: python3 sgeqr-demo.py
or python3 sgeqr-demo.py /path/to/liblapack.so
.
#!/usr/bin/python3
import array
import ctypes
import ctypes.util
from ctypes import byref, c_char, c_float, c_int32, c_void_p, POINTER
import math
import sys
def main():
if len(sys.argv) not in [1, 2]:
return "usage: python3 %s [path to LAPACK library]" % (sys.argv[0], )
if len(sys.argv) >= 2:
lapack_library = sys.argv[1]
else:
lapack_library = ctypes.util.find_library("lapack")
print("name of LAPACK library:", lapack_library)
lapack = ctypes.CDLL(lapack_library, use_errno=True)
# an alias for the _F_ortran integer type
f_int = ctypes.c_int32
lapack.sgeqrf_.restype = None
lapack.sgeqrf_.argtypes = [
POINTER(f_int),
POINTER(f_int),
# A
POINTER(c_float),
POINTER(f_int),
POINTER(c_float),
POINTER(c_float),
POINTER(f_int),
POINTER(f_int),
]
lapack.sorgqr_.restype = None
lapack.sorgqr_.argtypes = [
POINTER(f_int),
POINTER(f_int),
POINTER(f_int),
POINTER(c_float),
POINTER(f_int),
POINTER(c_float),
POINTER(c_float),
POINTER(f_int),
POINTER(f_int),
]
eps = 2**-23
nan = float("NaN")
m = 2
n = 2
x0 = 10/11 * eps
x1 = 12/10 * eps * x0
assert abs(x1 / x0) > eps
a = array.array("f", [x0, x1, -1, 0])
lda = m
assert len(a) == m * n
tau = array.array("f", [nan] * min(m, n))
work = array.array("f", [nan] * 256)
info = f_int(0)
intref = lambda n: byref(f_int(n))
f32ref = lambda array: (c_float * len(array)).from_buffer(array)
lapack.sgeqrf_(
intref(m),
intref(n),
f32ref(a),
intref(lda),
f32ref(tau),
f32ref(work),
intref(len(work)),
byref(info),
)
assert info.value == 0
print("tau +%8.2e %+8.2e" % (tau[0], tau[1]))
print()
print("a %+13.7e %+13.7e" % (a[0], a[2]))
print("a %+13.7e %+13.7e" % (a[1], a[3]))
lapack.sorgqr_(
intref(m),
intref(n),
intref(n),
f32ref(a),
intref(lda),
f32ref(tau),
f32ref(work),
intref(len(work)),
byref(info),
)
assert info.value == 0
print()
print("q %+13.7e %+13.7e" % (a[0], a[2]))
print("q %+13.7e %+13.7e" % (a[1], a[3]))
print()
print("delta = %+8.2e\n" % ((a[3] - 1) / eps,))
if __name__ == "__main__":
sys.exit(main())
Thanks for all this work Christoph, and taking the time to explain.
Three follow-up question/comment's.
(1) Would another representation of the Householder reflector be better? So right now LAPACK decides that the Householder vector is such that the top element is always a 1, but other choices are possible. See for example, The Computation of Elementary Unit Matrix
by Rich Lehoucq, ACM Transactions on Mathematical Software, Volume 22, Issue 4, pp 393–400, https://doi.org/10.1145/235815.235817, where Rich Lehoucq explains four previous representations (1) Wilkinson's approach, which is also used in EISPACK, (2) LINPACK approach, (3) NAG approach, (4) LAPACK approach. Or maybe another approach would help. I agree that this does not seem to help, but since I am not sure I wanted to ask.
(2) I would be in favor of spending some time to implement Always use the safe sign choice and hiding a sign bit for τ somewhere in the data as suggested by Pat Quillen.
How does that work for complex numbers though? It seems that for complex numbers we need more than a bit. Shall we just store an extra vectors of scalars (all of modulus 1, so +/-1 in real and e^(i.theta) in complex) along τ then?
(3) To some extent these signs are already in the output of GEQRF (or LARFG) . . . they are on the diagonal of R. So we could return the signs on the diagonal of R, being understood that of course
the diagonal entries of R are real nonnegative. And we are only storing the signs on the diagonal of R to save space. This sounds like a bad idea. And I would more be in favor to store an extra array for these signs, and then have the diagonal of R real.
We can add this on the wish list.
(1) Frankly I do not want to look at alternative representations now and finish #406 before involving myself in something else. Besides the current representation has very nice properties (the first reflector element is known to be one which allows for compact QR/QL/... representations, the reflector norm is close to one, and 1 ≤ τ ≤ 2).
(2) The issue when computing a reflector Hx = β e₁
is not computing a real β but the choice of the sign. (In fact, LAPACK computed real β until version 3.1.1 according to LAWN 203.) If it was not for the fixed computation of I - τ v v^*
, the issue would be trivial to fix, e.g., by
x(1)
with -1 by means of a matrix multiplication from the left side,x
with -1 by means of a matrix multiplication from the right side,τ v v^* - I
.~edit Now I see what you mean. τ is complex in the complex case. I need to think about this. /edit~
In the complex case, 1 ≤ Re(τ) ≤ 2 and only the sign bit of the real part of τ would need to be modified.
[The preceding paragraph mentions matrix multiplications from the left or the right. The idea is that x
is the first column of a matrix whose columns are about to be orthonormalized.]
In the current code, τ never has its sign bit set and one could exploit this unused bit of information to request the computation of |τ| v v^* - I
in the SLARF* subroutines. This would work as follows:
x(1) > 0
, then set the sign bit of τ.I - |τ| v v^*
or |τ| v v^* - I
, respectively.Note that this suggestion requires a float representation where +0 and -0 can be distinguished. Checking for τ < 0
is insufficient (think of x = [-1, 0]
).
The advantages of this approach are as follows:
xLARFG
.Disadvantages:
(3) The most important call site for xLARFGP are the functions for the CS decomposition and they do not compute QR decompositions.
- If the xLARF* implementations were not updated to handle τ with a sign bit set, then they will compute garbage. [...]
Two scenarios come to mind how this might happen:
The change would only affect users of xLARFGP though.
The stability proofs for the computations of QR decomposition by means of Householder reflections do not apply when the sign choice by slarfgP is used. In this post, I demonstrate the results with specially a crafted matrix A
and a unit-norm vector x
such that computations involving the orthogonal factor of the QR decomposition diverge much more rapidly during the computations of matrix-vector products Qᵏ x
in finite-precision arithmetic if slarfgP is employed instead of slarfg with its safe sign choice, where Q
is the orthogonal factor of a QR decomposition based on Householder reflections of A
. The effect is present for both the factored representation and the explicitly formed representation of the orthogonal factor.
Q
will denote the orthogonal factor of a QR decomposition in its factorized form whereas P
will denote the same factor after a call to xORGQR. The index +
denotes matrices computed by slarfgP. The Euclidean norm is used throughout this post.
Reminder: in real arithmetic, it holds that
‖x‖ = ‖Qᵏ x‖
for any integer k
,x = Q² x
.Based on my previous posts, inputs x
and A
were devised such that Q₊ x
is as inaccurate as possible. More concretely, each column a
that is used for the computation of a Householder reflector must have the following form:
‖a‖ ≪ 1
‖a(2:)‖ > ε ‖a(1)‖
‖a(2:)‖ ≪ ‖a(1)‖
.With a lot of trial-and-error, the following 3x3 matrix A
and vector x
was crafted (NB: crafted, not constructed. A lot of trial-and-error was involved.):
ε 1 0
8/7·ε^2 ε 0
ε^3 28/31·ε^2 0
Vector x:
0
-1/7
102 / 101
When computing ‖Q^k x‖
, the norm grows in every iteration when slarfgP is used (see table below) and the norm grows quickly beyond what is considered an acceptable rounding error. No such effect can be observed with slarfg and for this reason, there is no column for this method in the table below. Obviously, if Q₊ᵏ x
grows strictly monotonously in norm, then x = Q₊²ᵏ x
cannot hold. Furthermore, the growth in norm is still present though less pronounced after explicitly forming the orthogonal factor P₊
.
It holds that |‖Q₊²ᵏ x‖ - 1| ≤ ‖x - Q₊²ᵏ x‖
implying that the result of repeated multiplications is quickly becoming inaccurate.
k |
‖Q₊ᵏ x‖ - 1 |
‖P₊ᵏ x‖ - 1 |
---|---|---|
1 | 2ε | 2ε |
2 | 5ε | 4ε |
3 | 8ε | 6ε |
4 | 11ε | 8ε |
8 | 24ε | 16ε |
16 | 48ε | 33ε |
32 | 93ε | 66ε |
64 | 183ε | 132ε |
In the next experiment ‖x - Qᵏ x‖
, ‖x - Qᵏ x‖
and ‖x - 1/l P₊ᵏ x‖
are computed for even k
, where l = ‖Q₊ᵏ x‖
or l = ‖P₊ᵏ x‖
, respectively. The previous experiment showed that when slarfgP is employed, l
will be different from one. This paragraph demonstrates that in addition to a growth in norm of the multiplication result, the computed result is increasingly inaccurate when slarfgP is used (even after normalization). The effect is again less pronounced when the orthogonal factor P₊
is explicitly formed.
The table below does not contain a column for results involving P
(the explicitly formed orthogonal factor computed by slarfg) because the results with P
and Q
are indistinguishable at the accuracy level of the table.
k |
‖x - Qᵏ x‖ |
‖x - 1/l Q₊ᵏ x‖ |
‖x - 1/l P₊ᵏ x‖ |
---|---|---|---|
2 | 0.3ε | 1.0ε | 0.8ε |
4 | 0.6ε | 2.0ε | 1.4ε |
8 | 1.3ε | 3.6ε | 2.8ε |
16 | 2.6ε | 7.3ε | 5.4ε |
32 | 5.1ε | 15.1ε | 10.9ε |
64 | 10.2ε | 30.6ε | 21.6ε |
diff --git a/SRC/sgeqr2.f b/SRC/sgeqr2.f
index 3a78733b7..928fa90b1 100644
--- a/SRC/sgeqr2.f
+++ b/SRC/sgeqr2.f
@@ -150,7 +150,7 @@
REAL AII
* ..
* .. External Subroutines ..
- EXTERNAL SLARF, SLARFG, XERBLA
+ EXTERNAL SLARF, SLARFG, SLARFGP, XERBLA
* ..
* .. Intrinsic Functions ..
INTRINSIC MAX, MIN
@@ -178,7 +178,7 @@
*
* Generate elementary reflector H(i) to annihilate A(i+1:m,i)
*
- CALL SLARFG( M-I+1, A( I, I ), A( MIN( I+1, M ), I ), 1,
+ CALL SLARFGP( M-I+1, A( I, I ), A( MIN( I+1, M ), I ), 1,
$ TAU( I ) )
IF( I.LT.N ) THEN
*
Run the code with python3 /path/to/lapack/library <k>
.
#!/usr/bin/python3
import array
import copy
import ctypes
import ctypes.util
from ctypes import byref, c_char, c_double, c_float, c_int32, c_void_p, POINTER
from ctypes import c_size_t
import math
import sys
def print_matrix(m: int, n: int, a):
lda = m
for i in range(m):
for j in range(n):
index = j * lda + i
print("%+15.9e" % (a[index], ), end=" " if j + 1 < n else "")
print()
def main():
if len(sys.argv) not in [1, 2, 3]:
return "usage: python3 %s [path to LAPACK library] [k]" % (
sys.argv[0], )
if len(sys.argv) >= 2:
lapack_library = sys.argv[1]
else:
lapack_library = "/tmp/tmp.yVyeY13Yna/lib/liblapack.so.3"
k = int(sys.argv[2]) if len(sys.argv) >= 3 else 4
if k < 0:
return "the iteration count k must be positive, got %d" % (k,)
print("name of LAPACK library:", lapack_library)
print("iteration count:", k)
print()
lapack = ctypes.CDLL(lapack_library, use_errno=True)
# an alias for the _F_ortran integer type
f_int = ctypes.c_int32
lapack.sgemv_.restype = None
lapack.sgemv_.argtypes = [
POINTER(c_char),
POINTER(f_int),
POINTER(f_int),
# alpha
POINTER(c_float),
# A
POINTER(c_float),
POINTER(f_int),
# x
POINTER(c_float),
POINTER(f_int),
# beta
POINTER(c_float),
# y
POINTER(c_float),
POINTER(f_int),
POINTER(c_size_t),
]
lapack.sgeqrf_.restype = None
lapack.sgeqrf_.argtypes = [
POINTER(f_int),
POINTER(f_int),
# A
POINTER(c_float),
POINTER(f_int),
POINTER(c_float),
POINTER(c_float),
POINTER(f_int),
POINTER(f_int),
]
lapack.snrm2_.restype = ctypes.c_float
lapack.snrm2_.argtypes = [POINTER(f_int), POINTER(c_float), POINTER(f_int)]
lapack.sorgqr_.restype = None
lapack.sorgqr_.argtypes = [
POINTER(f_int),
POINTER(f_int),
POINTER(f_int),
POINTER(c_float),
POINTER(f_int),
POINTER(c_float),
POINTER(c_float),
POINTER(f_int),
POINTER(f_int),
]
lapack.sormqr_.restype = None
lapack.sormqr_.argtypes = [
POINTER(c_char),
POINTER(c_char),
POINTER(f_int),
POINTER(f_int),
POINTER(f_int),
# A
POINTER(c_float),
POINTER(f_int),
# tau
POINTER(c_float),
# C
POINTER(c_float),
POINTER(f_int),
# work
POINTER(c_float),
POINTER(f_int),
POINTER(f_int),
POINTER(c_size_t),
POINTER(c_size_t),
]
lapack.dgeqrf_.restype = None
lapack.dgeqrf_.argtypes = [
POINTER(f_int),
POINTER(f_int),
# A
POINTER(c_double),
POINTER(f_int),
POINTER(c_double),
POINTER(c_double),
POINTER(f_int),
POINTER(f_int),
]
lapack.dorgqr_.restype = None
lapack.dorgqr_.argtypes = [
POINTER(f_int),
POINTER(f_int),
POINTER(f_int),
POINTER(c_double),
POINTER(f_int),
POINTER(c_double),
POINTER(c_double),
POINTER(f_int),
POINTER(f_int),
]
eps32 = 2**-23
nan = float("NaN")
m = 3
n = m
lda = m
a = array.array("f", [0] * (lda * n))
a[0 * lda + 0] = eps32
a[0 * lda + 1] = 8 / 7 * eps32 * a[0]
a[0 * lda + 2] = eps32**2 * a[0]
a_10 = 1
a_11 = eps32 * a_10
a_12 = 28 / 31 * eps32 * a_11
a[1 * lda + 0] = a_10
a[1 * lda + 1] = a_11
a[1 * lda + 2] = a_12
assert len(a) == lda * n
print("A")
print_matrix(lda, n, a)
print()
tau = array.array("f", [nan] * min(m, n))
work = array.array("f", [nan] * 256)
info = f_int(0)
intref = lambda n: byref(f_int(n))
f32ref = lambda array: (c_float * len(array)).from_buffer(array)
lapack.sgeqrf_(
intref(m),
intref(n),
f32ref(a),
intref(lda),
f32ref(tau),
f32ref(work),
intref(len(work)),
byref(info),
)
assert info.value == 0
print("TAU")
for (i, t) in enumerate(tau):
print("%2d %+15.9e" % (i, t))
print()
print("A after xGEQR")
print_matrix(lda, n, a)
def multiply(a, lda, x):
y = array.array("f", [nan] * n)
alpha = c_float(1)
beta = c_float(0)
lapack.sgemv_(
b'N',
intref(m),
intref(n),
byref(c_float(1)),
f32ref(a),
intref(lda),
f32ref(x),
intref(1),
byref(beta),
f32ref(y),
intref(1),
byref(c_size_t(1)),
)
return y
def norm(xs):
return lapack.snrm2_(intref(len(xs)), f32ref(xs), intref(1))
def normalize(xs):
ys = copy.deepcopy(xs)
norm_x = norm(xs)
for i in range(len(xs)):
ys[i] = xs[i] / norm_x
return ys
def compute_delta(xs):
assert len(xs) == len(x_0)
return array.array("f", [xs[i] - x_0[i] for i in range(len(x_0))])
x_0 = normalize(array.array("f", [0, -1 / 7, 102 / 101]))
x = copy.deepcopy(x_0)
assert len(x) == n
for _ in range(k):
lapack.sormqr_(
b'L',
b'N',
intref(m),
intref(1),
intref(m),
f32ref(a),
intref(lda),
f32ref(tau),
f32ref(x),
intref(n),
f32ref(work),
intref(len(work)),
byref(info),
byref(c_size_t(1)),
byref(c_size_t(1)),
)
assert info.value == 0
x_normed = normalize(x)
p = copy.deepcopy(a)
ldp = lda
lapack.sorgqr_(
intref(m),
intref(n),
intref(n),
f32ref(p),
intref(ldp),
f32ref(tau),
f32ref(work),
intref(len(work)),
byref(info),
)
x_gemv = copy.deepcopy(x_0)
for _ in range(k):
x_gemv = multiply(p, ldp, x_gemv)
x_normed_v = normalize(x_gemv)
print()
print(" x", " ".join(["%+15.9e" % (x, ) for x in x_0]))
print(" Q^%d x" % (k, ), " ".join(["%+15.9e" % (x, ) for x in x]))
print("1/l Q^%d x" % (k, ),
" ".join(["%+15.9e" % (x, ) for x in x_normed]))
print(" P^%d x" % (k, ), " ".join(["%+15.9e" % (x, ) for x in x]))
print("1/l P^%d x" % (k, ),
" ".join(["%+15.9e" % (x, ) for x in x_normed_v]))
print()
print("norm(Q^%d x) = 1 + %.2f eps" % (k, (norm(x) - 1) / eps32))
print("norm(P^%d x) = 1 + %.2f eps" % (k, (norm(x_gemv) - 1) / eps32))
print()
if k % 2 == 0:
delta = compute_delta(x)
delta_normed = compute_delta(x_normed)
delta_gemv = compute_delta(x_gemv)
delta_normed_v = compute_delta(x_normed_v)
print("norm(x - Q^%d x) = %9.3e eps" % (k, norm(delta) / eps32))
print("norm(x - 1/l Q^%d x) = %9.3e eps" %
(k, norm(delta_normed) / eps32))
print("norm(x - P^%d x) = %9.3e eps" %
(k, norm(delta_gemv) / eps32))
print("norm(x - 1/l P^%d x) = %9.3e eps" %
(k, norm(delta_normed_v) / eps32))
assert info.value == 0
if __name__ == "__main__":
sys.exit(main())
Description
xLARFGP computes a Householder reflector. Given an input vector
x
of dimensionm
, xLARFGP computesH = I - β vv^*
such thatH x = ‖x‖ e1
andv(1) = 1
,where
β
is a scalar andv
is a vector.934 attempted to fix an overflow when 0 ≪ ‖x(2:m)‖ ≪ x(1) ≪ 1 by settings
x(2:m)
to zero under certain conditions. This does not resolve the issue. For example, letm = 2
andx = [ε^4, 2 ε^5]
, then an overflow takes place in the following code when1 / ALPHA
is computed:https://github.com/Reference-LAPACK/lapack/blob/ae9c818530c6a63cf82f8829d53e8db6b43c40a7/SRC/slarfgp.f#L224
From the point of view of mathematics, this line should simply compute
(See, e.g., Matrix Computations, 4th edition, Algorithm 5.1.1). Obviously the divisions cannot overflow and now the problem source becomes evident: the computation of the reciprocal of
ALPHA
. The use of, e.g., SLASCL in the case ofINCX = 1
resolves the problem.Minimal example demonstrating the problem (compile with
cc -Wextra -Wall -std=c99 -pedantic -llapack -lm
):Checklist