Closed alanedelman closed 11 years ago
I think the problem is OpenBLAS. If I use the version from vecLib I get the correct answer. I also did a test in Fortran where OpenBLAS returns NaN.
cc @xianyi
odd that openblas would mess up the largest singular value and not the rest
@andreasnoackjensen Could you post your Fortran code in a gist? It would probably help @xianyi narrow it down.
I am confused. I have tried to test this on Ubuntu as well and maybe you should ask the LAPACK author about this. Please have a look at https://gist.github.com/andreasnoackjensen/4976070. Maybe the common error factor is a development version of LAPACK, but I don't know how to get the exact snapshot dates for the LAPACK code.
Actually I checked with Ming Gu who told me that Ren Cang Li was the author.
I am enclosing what he said was the correct code, and he said there was a bug fix at some point. Can we see if this version works better?
It's possible that I'm one of the first customers of this code :-)
SUBROUTINE DLASD4( N, I, D, Z, DELTA, RHO, SIGMA, WORK, INFO )
*
* -- LAPACK auxiliary routine (version 3.4.0) --
* -- LAPACK is a software package provided by Univ. of Tennessee, --
* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
* November 2011
*
* .. Scalar Arguments ..
INTEGER I, INFO, N
DOUBLE PRECISION RHO, SIGMA
* ..
* .. Array Arguments ..
DOUBLE PRECISION D( * ), DELTA( * ), WORK( * ), Z( * )
* ..
*
* =====================================================================
*
* .. Parameters ..
INTEGER MAXIT
PARAMETER ( MAXIT = 60 )
DOUBLE PRECISION ZERO, ONE, TWO, THREE, FOUR, EIGHT, TEN
PARAMETER ( ZERO = 0.0D+0, ONE = 1.0D+0, TWO = 2.0D+0,
$ THREE = 3.0D+0, FOUR = 4.0D+0, EIGHT = 8.0D+0,
$ TEN = 10.0D+0 )
* ..
* .. Local Scalars ..
LOGICAL ORGATI, SWTCH, SWTCH3, GEOMAVG
INTEGER II, IIM1, IIP1, IP1, ITER, J, NITER
DOUBLE PRECISION A, B, C, DELSQ, DELSQ2, SQ2, DPHI, DPSI, DTIIM,
$ DTIIP, DTIPSQ, DTISQ, DTNSQ, DTNSQ1, DW, EPS,
$ ERRETM, ETA, PHI, PREW, PSI, RHOINV, SGLB,
$ SGUB, TAU, TAU2, TEMP, TEMP1, TEMP2, W
* ..
* .. Local Arrays ..
DOUBLE PRECISION DD( 3 ), ZZ( 3 )
* ..
* .. External Subroutines ..
EXTERNAL DLAED6, DLASD5
* ..
* .. External Functions ..
DOUBLE PRECISION DLAMCH
EXTERNAL DLAMCH
* ..
* .. Intrinsic Functions ..
INTRINSIC ABS, MAX, MIN, SQRT
* ..
* .. Executable Statements ..
*
* Since this routine is called in an inner loop, we do no argument
* checking.
*
* Quick return for N=1 and 2.
*
INFO = 0
IF( N.EQ.1 ) THEN
*
* Presumably, I=1 upon entry
*
SIGMA = SQRT( D( 1 )*D( 1 )+RHO*Z( 1 )*Z( 1 ) )
DELTA( 1 ) = ONE
WORK( 1 ) = ONE
RETURN
END IF
IF( N.EQ.2 ) THEN
CALL DLASD5( I, D, Z, DELTA, RHO, SIGMA, WORK )
RETURN
END IF
*
* Compute machine epsilon
*
EPS = DLAMCH( 'Epsilon' )
RHOINV = ONE / RHO
*
* The case I = N
*
IF( I.EQ.N ) THEN
*
* Initialize some basic variables
*
II = N - 1
NITER = 1
*
* Calculate initial guess
*
TEMP = RHO / TWO
*
* If ||Z||_2 is not one, then TEMP should be set to
* RHO * ||Z||_2^2 / TWO
*
TEMP1 = TEMP / ( D( N )+SQRT( D( N )*D( N )+TEMP ) )
DO 10 J = 1, N
WORK( J ) = D( J ) + D( N ) + TEMP1
DELTA( J ) = ( D( J )-D( N ) ) - TEMP1
10 CONTINUE
*
PSI = ZERO
DO 20 J = 1, N - 2
PSI = PSI + Z( J )*Z( J ) / ( DELTA( J )*WORK( J ) )
20 CONTINUE
*
C = RHOINV + PSI
W = C + Z( II )*Z( II ) / ( DELTA( II )*WORK( II ) ) +
$ Z( N )*Z( N ) / ( DELTA( N )*WORK( N ) )
*
IF( W.LE.ZERO ) THEN
TEMP1 = SQRT( D( N )*D( N )+RHO )
TEMP = Z( N-1 )*Z( N-1 ) / ( ( D( N-1 )+TEMP1 )*
$ ( D( N )-D( N-1 )+RHO / ( D( N )+TEMP1 ) ) ) +
$ Z( N )*Z( N ) / RHO
*
* The following TAU2 is to approximate
* SIGMA_n^2 - D( N )*D( N )
*
IF( C.LE.TEMP ) THEN
TAU = RHO
ELSE
DELSQ = ( D( N )-D( N-1 ) )*( D( N )+D( N-1 ) )
A = -C*DELSQ + Z( N-1 )*Z( N-1 ) + Z( N )*Z( N )
B = Z( N )*Z( N )*DELSQ
IF( A.LT.ZERO ) THEN
TAU2 = TWO*B / ( SQRT( A*A+FOUR*B*C )-A )
ELSE
TAU2 = ( A+SQRT( A*A+FOUR*B*C ) ) / ( TWO*C )
END IF
END IF
*
* It can be proved that
* D(N)^2+RHO/2 <= SIGMA_n^2 < D(N)^2+TAU2 <= D(N)^2+RHO
*
ELSE
DELSQ = ( D( N )-D( N-1 ) )*( D( N )+D( N-1 ) )
A = -C*DELSQ + Z( N-1 )*Z( N-1 ) + Z( N )*Z( N )
B = Z( N )*Z( N )*DELSQ
*
* The following TAU2 is to approximate
* SIGMA_n^2 - D( N )*D( N )
*
IF( A.LT.ZERO ) THEN
TAU2 = TWO*B / ( SQRT( A*A+FOUR*B*C )-A )
ELSE
TAU2 = ( A+SQRT( A*A+FOUR*B*C ) ) / ( TWO*C )
END IF
*
* It can be proved that
* D(N)^2 < D(N)^2+TAU2 < SIGMA(N)^2 < D(N)^2+RHO/2
*
END IF
*
* The following TAU is to approximate SIGMA_n - D( N )
*
TAU = TAU2 / ( D( N )+SQRT( D( N )*D( N )+TAU2 ) )
*
SIGMA = D( N ) + TAU
DO 30 J = 1, N
DELTA( J ) = ( D( J )-D( N ) ) - TAU
WORK( J ) = D( J ) + D( N ) + TAU
30 CONTINUE
*
* Evaluate PSI and the derivative DPSI
*
DPSI = ZERO
PSI = ZERO
ERRETM = ZERO
DO 40 J = 1, II
TEMP = Z( J ) / ( DELTA( J )*WORK( J ) )
PSI = PSI + Z( J )*TEMP
DPSI = DPSI + TEMP*TEMP
ERRETM = ERRETM + PSI
40 CONTINUE
ERRETM = ABS( ERRETM )
*
* Evaluate PHI and the derivative DPHI
*
TEMP = Z( N ) / ( DELTA( N )*WORK( N ) )
PHI = Z( N )*TEMP
DPHI = TEMP*TEMP
ERRETM = EIGHT*( -PHI-PSI ) + ERRETM - PHI + RHOINV
* $ + ABS( TAU2 )*( DPSI+DPHI )
*
W = RHOINV + PHI + PSI
*
* Test for convergence
*
IF( ABS( W ).LE.EPS*ERRETM ) THEN
GO TO 240
END IF
*
* Calculate the new step
*
NITER = NITER + 1
DTNSQ1 = WORK( N-1 )*DELTA( N-1 )
DTNSQ = WORK( N )*DELTA( N )
C = W - DTNSQ1*DPSI - DTNSQ*DPHI
A = ( DTNSQ+DTNSQ1 )*W - DTNSQ*DTNSQ1*( DPSI+DPHI )
B = DTNSQ*DTNSQ1*W
IF( C.LT.ZERO )
$ C = ABS( C )
IF( C.EQ.ZERO ) THEN
ETA = RHO - SIGMA*SIGMA
ELSE IF( A.GE.ZERO ) THEN
ETA = ( A+SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
ELSE
ETA = TWO*B / ( A-SQRT( ABS( A*A-FOUR*B*C ) ) )
END IF
*
* Note, eta should be positive if w is negative, and
* eta should be negative otherwise. However,
* if for some reason caused by roundoff, eta*w > 0,
* we simply use one Newton step instead. This way
* will guarantee eta*w < 0.
*
IF( W*ETA.GT.ZERO )
$ ETA = -W / ( DPSI+DPHI )
TEMP = ETA - DTNSQ
IF( TEMP.GT.RHO )
$ ETA = RHO + DTNSQ
*
ETA = ETA / ( SIGMA+SQRT( ETA+SIGMA*SIGMA ) )
TAU = TAU + ETA
SIGMA = SIGMA + ETA
*
DO 50 J = 1, N
DELTA( J ) = DELTA( J ) - ETA
WORK( J ) = WORK( J ) + ETA
50 CONTINUE
*
* Evaluate PSI and the derivative DPSI
*
DPSI = ZERO
PSI = ZERO
ERRETM = ZERO
DO 60 J = 1, II
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PSI = PSI + Z( J )*TEMP
DPSI = DPSI + TEMP*TEMP
ERRETM = ERRETM + PSI
60 CONTINUE
ERRETM = ABS( ERRETM )
*
* Evaluate PHI and the derivative DPHI
*
TAU2 = WORK( N )*DELTA( N )
TEMP = Z( N ) / TAU2
PHI = Z( N )*TEMP
DPHI = TEMP*TEMP
ERRETM = EIGHT*( -PHI-PSI ) + ERRETM - PHI + RHOINV
* $ + ABS( TAU2 )*( DPSI+DPHI )
*
W = RHOINV + PHI + PSI
*
* Main loop to update the values of the array DELTA
*
ITER = NITER + 1
*
DO 90 NITER = ITER, MAXIT
*
* Test for convergence
*
IF( ABS( W ).LE.EPS*ERRETM ) THEN
GO TO 240
END IF
*
* Calculate the new step
*
DTNSQ1 = WORK( N-1 )*DELTA( N-1 )
DTNSQ = WORK( N )*DELTA( N )
C = W - DTNSQ1*DPSI - DTNSQ*DPHI
A = ( DTNSQ+DTNSQ1 )*W - DTNSQ1*DTNSQ*( DPSI+DPHI )
B = DTNSQ1*DTNSQ*W
IF( A.GE.ZERO ) THEN
ETA = ( A+SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
ELSE
ETA = TWO*B / ( A-SQRT( ABS( A*A-FOUR*B*C ) ) )
END IF
*
* Note, eta should be positive if w is negative, and
* eta should be negative otherwise. However,
* if for some reason caused by roundoff, eta*w > 0,
* we simply use one Newton step instead. This way
* will guarantee eta*w < 0.
*
IF( W*ETA.GT.ZERO )
$ ETA = -W / ( DPSI+DPHI )
TEMP = ETA - DTNSQ
IF( TEMP.LE.ZERO )
$ ETA = ETA / TWO
*
ETA = ETA / ( SIGMA+SQRT( ETA+SIGMA*SIGMA ) )
TAU = TAU + ETA
SIGMA = SIGMA + ETA
*
DO 70 J = 1, N
DELTA( J ) = DELTA( J ) - ETA
WORK( J ) = WORK( J ) + ETA
70 CONTINUE
*
* Evaluate PSI and the derivative DPSI
*
DPSI = ZERO
PSI = ZERO
ERRETM = ZERO
DO 80 J = 1, II
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PSI = PSI + Z( J )*TEMP
DPSI = DPSI + TEMP*TEMP
ERRETM = ERRETM + PSI
80 CONTINUE
ERRETM = ABS( ERRETM )
*
* Evaluate PHI and the derivative DPHI
*
TAU2 = WORK( N )*DELTA( N )
TEMP = Z( N ) / TAU2
PHI = Z( N )*TEMP
DPHI = TEMP*TEMP
ERRETM = EIGHT*( -PHI-PSI ) + ERRETM - PHI + RHOINV
* $ + ABS( TAU2 )*( DPSI+DPHI )
*
W = RHOINV + PHI + PSI
90 CONTINUE
*
* Return with INFO = 1, NITER = MAXIT and not converged
*
INFO = 1
GO TO 240
*
* End for the case I = N
*
ELSE
*
* The case for I < N
*
NITER = 1
IP1 = I + 1
*
* Calculate initial guess
*
DELSQ = ( D( IP1 )-D( I ) )*( D( IP1 )+D( I ) )
DELSQ2 = DELSQ / TWO
SQ2=SQRT( ( D( I )*D( I )+D( IP1 )*D( IP1 ) ) / TWO )
TEMP = DELSQ2 / ( D( I )+SQ2 )
DO 100 J = 1, N
WORK( J ) = D( J ) + D( I ) + TEMP
DELTA( J ) = ( D( J )-D( I ) ) - TEMP
100 CONTINUE
*
PSI = ZERO
DO 110 J = 1, I - 1
PSI = PSI + Z( J )*Z( J ) / ( WORK( J )*DELTA( J ) )
110 CONTINUE
*
PHI = ZERO
DO 120 J = N, I + 2, -1
PHI = PHI + Z( J )*Z( J ) / ( WORK( J )*DELTA( J ) )
120 CONTINUE
C = RHOINV + PSI + PHI
W = C + Z( I )*Z( I ) / ( WORK( I )*DELTA( I ) ) +
$ Z( IP1 )*Z( IP1 ) / ( WORK( IP1 )*DELTA( IP1 ) )
*
GEOMAVG = .FALSE.
IF( W.GT.ZERO ) THEN
*
* d(i)^2 < the ith sigma^2 < (d(i)^2+d(i+1)^2)/2
*
* We choose d(i) as origin.
*
ORGATI = .TRUE.
II = I
SGLB = ZERO
SGUB = DELSQ2 / ( D( I )+SQ2 )
A = C*DELSQ + Z( I )*Z( I ) + Z( IP1 )*Z( IP1 )
B = Z( I )*Z( I )*DELSQ
IF( A.GT.ZERO ) THEN
TAU2 = TWO*B / ( A+SQRT( ABS( A*A-FOUR*B*C ) ) )
ELSE
TAU2 = ( A-SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
END IF
*
* TAU2 now is an estimation of SIGMA^2 - D( I )^2. The
* following, however, is the corresponding estimation of
* SIGMA - D( I ).
*
TAU = TAU2 / ( D( I )+SQRT( D( I )*D( I )+TAU2 ) )
TEMP = SQRT(EPS)
IF( (D(I).LE.TEMP*D(IP1)).AND.(ABS(Z(I)).LE.TEMP)
$ .AND.(D(I).GT.ZERO) ) THEN
TAU = MIN( TEN*D(I), SGUB )
GEOMAVG = .TRUE.
END IF
ELSE
*
* (d(i)^2+d(i+1)^2)/2 <= the ith sigma^2 < d(i+1)^2/2
*
* We choose d(i+1) as origin.
*
ORGATI = .FALSE.
II = IP1
SGLB = -DELSQ2 / ( D( II )+SQ2 )
SGUB = ZERO
A = C*DELSQ - Z( I )*Z( I ) - Z( IP1 )*Z( IP1 )
B = Z( IP1 )*Z( IP1 )*DELSQ
IF( A.LT.ZERO ) THEN
TAU2 = TWO*B / ( A-SQRT( ABS( A*A+FOUR*B*C ) ) )
ELSE
TAU2 = -( A+SQRT( ABS( A*A+FOUR*B*C ) ) ) / ( TWO*C )
END IF
*
* TAU2 now is an estimation of SIGMA^2 - D( IP1 )^2. The
* following, however, is the corresponding estimation of
* SIGMA - D( IP1 ).
*
TAU = TAU2 / ( D( IP1 )+SQRT( ABS( D( IP1 )*D( IP1 )+
$ TAU2 ) ) )
END IF
*
SIGMA = D( II ) + TAU
DO 130 J = 1, N
WORK( J ) = D( J ) + D( II ) + TAU
DELTA( J ) = ( D( J )-D( II ) ) - TAU
130 CONTINUE
IIM1 = II - 1
IIP1 = II + 1
*
* Evaluate PSI and the derivative DPSI
*
DPSI = ZERO
PSI = ZERO
ERRETM = ZERO
DO 150 J = 1, IIM1
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PSI = PSI + Z( J )*TEMP
DPSI = DPSI + TEMP*TEMP
ERRETM = ERRETM + PSI
150 CONTINUE
ERRETM = ABS( ERRETM )
*
* Evaluate PHI and the derivative DPHI
*
DPHI = ZERO
PHI = ZERO
DO 160 J = N, IIP1, -1
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PHI = PHI + Z( J )*TEMP
DPHI = DPHI + TEMP*TEMP
ERRETM = ERRETM + PHI
160 CONTINUE
*
W = RHOINV + PHI + PSI
*
* W is the value of the secular function with
* its ii-th element removed.
*
SWTCH3 = .FALSE.
IF( ORGATI ) THEN
IF( W.LT.ZERO )
$ SWTCH3 = .TRUE.
ELSE
IF( W.GT.ZERO )
$ SWTCH3 = .TRUE.
END IF
IF( II.EQ.1 .OR. II.EQ.N )
$ SWTCH3 = .FALSE.
*
TEMP = Z( II ) / ( WORK( II )*DELTA( II ) )
DW = DPSI + DPHI + TEMP*TEMP
TEMP = Z( II )*TEMP
W = W + TEMP
ERRETM = EIGHT*( PHI-PSI ) + ERRETM + TWO*RHOINV
$ + THREE*ABS( TEMP )
* $ + ABS( TAU2 )*DW
*
* Test for convergence
*
IF( ABS( W ).LE.EPS*ERRETM ) THEN
GO TO 240
END IF
*
IF( W.LE.ZERO ) THEN
SGLB = MAX( SGLB, TAU )
ELSE
SGUB = MIN( SGUB, TAU )
END IF
*
* Calculate the new step
*
NITER = NITER + 1
IF( .NOT.SWTCH3 ) THEN
DTIPSQ = WORK( IP1 )*DELTA( IP1 )
DTISQ = WORK( I )*DELTA( I )
IF( ORGATI ) THEN
C = W - DTIPSQ*DW + DELSQ*( Z( I ) / DTISQ )**2
ELSE
C = W - DTISQ*DW - DELSQ*( Z( IP1 ) / DTIPSQ )**2
END IF
A = ( DTIPSQ+DTISQ )*W - DTIPSQ*DTISQ*DW
B = DTIPSQ*DTISQ*W
IF( C.EQ.ZERO ) THEN
IF( A.EQ.ZERO ) THEN
IF( ORGATI ) THEN
A = Z( I )*Z( I ) + DTIPSQ*DTIPSQ*( DPSI+DPHI )
ELSE
A = Z( IP1 )*Z( IP1 ) + DTISQ*DTISQ*( DPSI+DPHI )
END IF
END IF
ETA = B / A
ELSE IF( A.LE.ZERO ) THEN
ETA = ( A-SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
ELSE
ETA = TWO*B / ( A+SQRT( ABS( A*A-FOUR*B*C ) ) )
END IF
ELSE
*
* Interpolation using THREE most relevant poles
*
DTIIM = WORK( IIM1 )*DELTA( IIM1 )
DTIIP = WORK( IIP1 )*DELTA( IIP1 )
TEMP = RHOINV + PSI + PHI
IF( ORGATI ) THEN
TEMP1 = Z( IIM1 ) / DTIIM
TEMP1 = TEMP1*TEMP1
C = ( TEMP - DTIIP*( DPSI+DPHI ) ) -
$ ( D( IIM1 )-D( IIP1 ) )*( D( IIM1 )+D( IIP1 ) )*TEMP1
ZZ( 1 ) = Z( IIM1 )*Z( IIM1 )
IF( DPSI.LT.TEMP1 ) THEN
ZZ( 3 ) = DTIIP*DTIIP*DPHI
ELSE
ZZ( 3 ) = DTIIP*DTIIP*( ( DPSI-TEMP1 )+DPHI )
END IF
ELSE
TEMP1 = Z( IIP1 ) / DTIIP
TEMP1 = TEMP1*TEMP1
C = ( TEMP - DTIIM*( DPSI+DPHI ) ) -
$ ( D( IIP1 )-D( IIM1 ) )*( D( IIM1 )+D( IIP1 ) )*TEMP1
IF( DPHI.LT.TEMP1 ) THEN
ZZ( 1 ) = DTIIM*DTIIM*DPSI
ELSE
ZZ( 1 ) = DTIIM*DTIIM*( DPSI+( DPHI-TEMP1 ) )
END IF
ZZ( 3 ) = Z( IIP1 )*Z( IIP1 )
END IF
ZZ( 2 ) = Z( II )*Z( II )
DD( 1 ) = DTIIM
DD( 2 ) = DELTA( II )*WORK( II )
DD( 3 ) = DTIIP
CALL DLAED6( NITER, ORGATI, C, DD, ZZ, W, ETA, INFO )
*
IF( INFO.NE.0 ) THEN
*
* If INFO is not 0, i.e., DLAED6 failed, switch back
* to 2 pole interpolation.
*
SWTCH3 = .FALSE.
INFO = 0
DTIPSQ = WORK( IP1 )*DELTA( IP1 )
DTISQ = WORK( I )*DELTA( I )
IF( ORGATI ) THEN
C = W - DTIPSQ*DW + DELSQ*( Z( I ) / DTISQ )**2
ELSE
C = W - DTISQ*DW - DELSQ*( Z( IP1 ) / DTIPSQ )**2
END IF
A = ( DTIPSQ+DTISQ )*W - DTIPSQ*DTISQ*DW
B = DTIPSQ*DTISQ*W
IF( C.EQ.ZERO ) THEN
IF( A.EQ.ZERO ) THEN
IF( ORGATI ) THEN
A = Z( I )*Z( I ) + DTIPSQ*DTIPSQ*( DPSI+DPHI )
ELSE
A = Z( IP1 )*Z( IP1 ) + DTISQ*DTISQ*( DPSI+DPHI)
END IF
END IF
ETA = B / A
ELSE IF( A.LE.ZERO ) THEN
ETA = ( A-SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
ELSE
ETA = TWO*B / ( A+SQRT( ABS( A*A-FOUR*B*C ) ) )
END IF
END IF
END IF
*
* Note, eta should be positive if w is negative, and
* eta should be negative otherwise. However,
* if for some reason caused by roundoff, eta*w > 0,
* we simply use one Newton step instead. This way
* will guarantee eta*w < 0.
*
IF( W*ETA.GE.ZERO )
$ ETA = -W / DW
*
ETA = ETA / ( SIGMA+SQRT( SIGMA*SIGMA+ETA ) )
TEMP = TAU + ETA
IF( TEMP.GT.SGUB .OR. TEMP.LT.SGLB ) THEN
IF( W.LT.ZERO ) THEN
ETA = ( SGUB-TAU ) / TWO
ELSE
ETA = ( SGLB-TAU ) / TWO
END IF
IF( GEOMAVG ) THEN
IF( W .LT. ZERO ) THEN
IF( TAU .GT. ZERO ) THEN
ETA = SQRT(SGUB*TAU)-TAU
END IF
ELSE
IF( SGLB .GT. ZERO ) THEN
ETA = SQRT(SGLB*TAU)-TAU
END IF
END IF
END IF
END IF
*
PREW = W
*
TAU = TAU + ETA
SIGMA = SIGMA + ETA
*
DO 170 J = 1, N
WORK( J ) = WORK( J ) + ETA
DELTA( J ) = DELTA( J ) - ETA
170 CONTINUE
*
* Evaluate PSI and the derivative DPSI
*
DPSI = ZERO
PSI = ZERO
ERRETM = ZERO
DO 180 J = 1, IIM1
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PSI = PSI + Z( J )*TEMP
DPSI = DPSI + TEMP*TEMP
ERRETM = ERRETM + PSI
180 CONTINUE
ERRETM = ABS( ERRETM )
*
* Evaluate PHI and the derivative DPHI
*
DPHI = ZERO
PHI = ZERO
DO 190 J = N, IIP1, -1
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PHI = PHI + Z( J )*TEMP
DPHI = DPHI + TEMP*TEMP
ERRETM = ERRETM + PHI
190 CONTINUE
*
TAU2 = WORK( II )*DELTA( II )
TEMP = Z( II ) / TAU2
DW = DPSI + DPHI + TEMP*TEMP
TEMP = Z( II )*TEMP
W = RHOINV + PHI + PSI + TEMP
ERRETM = EIGHT*( PHI-PSI ) + ERRETM + TWO*RHOINV
$ + THREE*ABS( TEMP )
* $ + ABS( TAU2 )*DW
*
SWTCH = .FALSE.
IF( ORGATI ) THEN
IF( -W.GT.ABS( PREW ) / TEN )
$ SWTCH = .TRUE.
ELSE
IF( W.GT.ABS( PREW ) / TEN )
$ SWTCH = .TRUE.
END IF
*
* Main loop to update the values of the array DELTA and WORK
*
ITER = NITER + 1
*
DO 230 NITER = ITER, MAXIT
*
* Test for convergence
*
IF( ABS( W ).LE.EPS*ERRETM ) THEN
* $ .OR. (SGUB-SGLB).LE.EIGHT*ABS(SGUB+SGLB) ) THEN
GO TO 240
END IF
*
IF( W.LE.ZERO ) THEN
SGLB = MAX( SGLB, TAU )
ELSE
SGUB = MIN( SGUB, TAU )
END IF
*
* Calculate the new step
*
IF( .NOT.SWTCH3 ) THEN
DTIPSQ = WORK( IP1 )*DELTA( IP1 )
DTISQ = WORK( I )*DELTA( I )
IF( .NOT.SWTCH ) THEN
IF( ORGATI ) THEN
C = W - DTIPSQ*DW + DELSQ*( Z( I ) / DTISQ )**2
ELSE
C = W - DTISQ*DW - DELSQ*( Z( IP1 ) / DTIPSQ )**2
END IF
ELSE
TEMP = Z( II ) / ( WORK( II )*DELTA( II ) )
IF( ORGATI ) THEN
DPSI = DPSI + TEMP*TEMP
ELSE
DPHI = DPHI + TEMP*TEMP
END IF
C = W - DTISQ*DPSI - DTIPSQ*DPHI
END IF
A = ( DTIPSQ+DTISQ )*W - DTIPSQ*DTISQ*DW
B = DTIPSQ*DTISQ*W
IF( C.EQ.ZERO ) THEN
IF( A.EQ.ZERO ) THEN
IF( .NOT.SWTCH ) THEN
IF( ORGATI ) THEN
A = Z( I )*Z( I ) + DTIPSQ*DTIPSQ*
$ ( DPSI+DPHI )
ELSE
A = Z( IP1 )*Z( IP1 ) +
$ DTISQ*DTISQ*( DPSI+DPHI )
END IF
ELSE
A = DTISQ*DTISQ*DPSI + DTIPSQ*DTIPSQ*DPHI
END IF
END IF
ETA = B / A
ELSE IF( A.LE.ZERO ) THEN
ETA = ( A-SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
ELSE
ETA = TWO*B / ( A+SQRT( ABS( A*A-FOUR*B*C ) ) )
END IF
ELSE
*
* Interpolation using THREE most relevant poles
*
DTIIM = WORK( IIM1 )*DELTA( IIM1 )
DTIIP = WORK( IIP1 )*DELTA( IIP1 )
TEMP = RHOINV + PSI + PHI
IF( SWTCH ) THEN
C = TEMP - DTIIM*DPSI - DTIIP*DPHI
ZZ( 1 ) = DTIIM*DTIIM*DPSI
ZZ( 3 ) = DTIIP*DTIIP*DPHI
ELSE
IF( ORGATI ) THEN
TEMP1 = Z( IIM1 ) / DTIIM
TEMP1 = TEMP1*TEMP1
TEMP2 = ( D( IIM1 )-D( IIP1 ) )*
$ ( D( IIM1 )+D( IIP1 ) )*TEMP1
C = TEMP - DTIIP*( DPSI+DPHI ) - TEMP2
ZZ( 1 ) = Z( IIM1 )*Z( IIM1 )
IF( DPSI.LT.TEMP1 ) THEN
ZZ( 3 ) = DTIIP*DTIIP*DPHI
ELSE
ZZ( 3 ) = DTIIP*DTIIP*( ( DPSI-TEMP1 )+DPHI )
END IF
ELSE
TEMP1 = Z( IIP1 ) / DTIIP
TEMP1 = TEMP1*TEMP1
TEMP2 = ( D( IIP1 )-D( IIM1 ) )*
$ ( D( IIM1 )+D( IIP1 ) )*TEMP1
C = TEMP - DTIIM*( DPSI+DPHI ) - TEMP2
IF( DPHI.LT.TEMP1 ) THEN
ZZ( 1 ) = DTIIM*DTIIM*DPSI
ELSE
ZZ( 1 ) = DTIIM*DTIIM*( DPSI+( DPHI-TEMP1 ) )
END IF
ZZ( 3 ) = Z( IIP1 )*Z( IIP1 )
END IF
END IF
DD( 1 ) = DTIIM
DD( 2 ) = DELTA( II )*WORK( II )
DD( 3 ) = DTIIP
CALL DLAED6( NITER, ORGATI, C, DD, ZZ, W, ETA, INFO )
*
IF( INFO.NE.0 ) THEN
*
* If INFO is not 0, i.e., DLAED6 failed, switch
* back to two pole interpolation
*
SWTCH3 = .FALSE.
INFO = 0
DTIPSQ = WORK( IP1 )*DELTA( IP1 )
DTISQ = WORK( I )*DELTA( I )
IF( .NOT.SWTCH ) THEN
IF( ORGATI ) THEN
C = W - DTIPSQ*DW + DELSQ*( Z( I )/DTISQ )**2
ELSE
C = W - DTISQ*DW - DELSQ*( Z( IP1 )/DTIPSQ )**2
END IF
ELSE
TEMP = Z( II ) / ( WORK( II )*DELTA( II ) )
IF( ORGATI ) THEN
DPSI = DPSI + TEMP*TEMP
ELSE
DPHI = DPHI + TEMP*TEMP
END IF
C = W - DTISQ*DPSI - DTIPSQ*DPHI
END IF
A = ( DTIPSQ+DTISQ )*W - DTIPSQ*DTISQ*DW
B = DTIPSQ*DTISQ*W
IF( C.EQ.ZERO ) THEN
IF( A.EQ.ZERO ) THEN
IF( .NOT.SWTCH ) THEN
IF( ORGATI ) THEN
A = Z( I )*Z( I ) + DTIPSQ*DTIPSQ*
$ ( DPSI+DPHI )
ELSE
A = Z( IP1 )*Z( IP1 ) +
$ DTISQ*DTISQ*( DPSI+DPHI )
END IF
ELSE
A = DTISQ*DTISQ*DPSI + DTIPSQ*DTIPSQ*DPHI
END IF
END IF
ETA = B / A
ELSE IF( A.LE.ZERO ) THEN
ETA = ( A-SQRT( ABS( A*A-FOUR*B*C ) ) ) / ( TWO*C )
ELSE
ETA = TWO*B / ( A+SQRT( ABS( A*A-FOUR*B*C ) ) )
END IF
END IF
END IF
*
* Note, eta should be positive if w is negative, and
* eta should be negative otherwise. However,
* if for some reason caused by roundoff, eta*w > 0,
* we simply use one Newton step instead. This way
* will guarantee eta*w < 0.
*
IF( W*ETA.GE.ZERO )
$ ETA = -W / DW
*
ETA = ETA / ( SIGMA+SQRT( SIGMA*SIGMA+ETA ) )
TEMP=TAU+ETA
IF( TEMP.GT.SGUB .OR. TEMP.LT.SGLB ) THEN
IF( W.LT.ZERO ) THEN
ETA = ( SGUB-TAU ) / TWO
ELSE
ETA = ( SGLB-TAU ) / TWO
END IF
IF( GEOMAVG ) THEN
IF( W .LT. ZERO ) THEN
IF( TAU .GT. ZERO ) THEN
ETA = SQRT(SGUB*TAU)-TAU
END IF
ELSE
IF( SGLB .GT. ZERO ) THEN
ETA = SQRT(SGLB*TAU)-TAU
END IF
END IF
END IF
END IF
*
PREW = W
*
TAU = TAU + ETA
SIGMA = SIGMA + ETA
*
DO 200 J = 1, N
WORK( J ) = WORK( J ) + ETA
DELTA( J ) = DELTA( J ) - ETA
200 CONTINUE
*
* Evaluate PSI and the derivative DPSI
*
DPSI = ZERO
PSI = ZERO
ERRETM = ZERO
DO 210 J = 1, IIM1
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PSI = PSI + Z( J )*TEMP
DPSI = DPSI + TEMP*TEMP
ERRETM = ERRETM + PSI
210 CONTINUE
ERRETM = ABS( ERRETM )
*
* Evaluate PHI and the derivative DPHI
*
DPHI = ZERO
PHI = ZERO
DO 220 J = N, IIP1, -1
TEMP = Z( J ) / ( WORK( J )*DELTA( J ) )
PHI = PHI + Z( J )*TEMP
DPHI = DPHI + TEMP*TEMP
ERRETM = ERRETM + PHI
220 CONTINUE
*
TAU2 = WORK( II )*DELTA( II )
TEMP = Z( II ) / TAU2
DW = DPSI + DPHI + TEMP*TEMP
TEMP = Z( II )*TEMP
W = RHOINV + PHI + PSI + TEMP
ERRETM = EIGHT*( PHI-PSI ) + ERRETM + TWO*RHOINV
$ + THREE*ABS( TEMP )
* $ + ABS( TAU2 )*DW
*
IF( W*PREW.GT.ZERO .AND. ABS( W ).GT.ABS( PREW ) / TEN )
$ SWTCH = .NOT.SWTCH
*
230 CONTINUE
*
* Return with INFO = 1, NITER = MAXIT and not converged
*
INFO = 1
*
END IF
*
240 CONTINUE
RETURN
*
* End of DLASD4
*
END
Your version from 3.4.0 is identical to the version in 3.4.2 (the latest and the one in Ubuntu 13.04) except for MAXIT. However, the version of dlasd4.f in 3.4.1 (which is the version in Ubuntu 12.10) is significantly different from the two others. The versions in 3.4.0 and 3.4.2 give the wrong answer when linked dynamically but not statically. The version in 3.4.1 is correct when it returns info=0
but for the same input it sometimes return info=+-BIGNUM
.
What do you make of all this?
Hi @andreasnoackjensen ,
Do you mean you can get the correct result with the static library?
Xianyi
Hi @xianyi
Yes exactly. I have updated the gist with an example where I link statically and dynamically to a fresh build of OpenBLAS-develop.
This is rather bizarre!
I have tried the code again on Ubuntu. Last time was on Mac. On Ubuntu I can also get the wrong result with static linking. It is just kind of random and can be triggered by different things. In particular I can change the result by commenting out the write statements as shown in the gist. Hence I guess the subroutine in LAPACK 3.4.0 and 3.4.2 has some kind of error, but not the subroutine in 3.4.1.
@alanedelman Could you point the authors to this issue page?
@alanedelman Correction. I assumed that the version you posted was from LAPACK 3.4.0 since that is what the header says. However, that is not the case. Your version is the same as dlasd4.f in LAPACK 3.4.2 except that in yours MAXIT=60
and in LAPACK 3.4.2 MAXIT=600
, but I think both has a bug. The old version pre 3.4.2 might have different bug.
Update: Sorry for spamming. The old bug might be related the following which is from julia with LAPACK 3.4.1 and reference BLAS
julia> v=sort(randn(10).^2);z=randn(10);
julia> svdmax(v,z)
ERROR: LapackException(-1099511627776)
in lasd4! at /home/andreasnoackjensen/Desktop/dlasd4.jl:31
in svdmax at /home/andreasnoackjensen/Desktop/dlasd4.jl:37
julia> svdmax(v,z)
4.571419253335635
This is now submitted as a LAPACK bug so I think this issue can be closed.
We can leave it open until Julia updates to a fixed version.
I have added an upstream tag to this issue. We should leave it open until upstream fixes it, and we use it.
Could you also provide pointers to the communication with the LAPACK team for the record here?
The bug is registered here
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4250
@andreasnoackjensen Should we just replace the dlasd4
in 3.4.2 with the one in 3.4.1 as suggested in the upstream thread?
I think we should also find the matrices for which the svd is incorrect due to this bug and add it to our testsuite. Since we are going to release 0.2 at some point, we should certainly make sure that our svd is not broken.
I am tagging this as 0.2.
Also see discussion in xianyi/openblas#225.
I am not sure about this one. While I have experienced weird errors while calling svdvals
either on general matrices or bidiagonals as in #3396 I cannot produce an example that makes svdvals
consistently failing. Also, maybe the changes in dlasd4
were made for a reason that we are not aware of. It would be good if we got an official answer on the bug report. Does Alan or anyone else know about when next release of LAPACK is out?
@alanedelman Can you reach out to someone in the LAPACK team about dlasd4
?
This is clearly not a 0.2 issue. Updated issue to have no milestone.
There is a patch for this now. See http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4250. The development version of LAPACK also has a fix and both solutions seem to solve the problem on my machine.
Perhaps we should apply this patch after untarring openblas.
@staticfloat We will need to apply this patch for your openblas 2.8 debs too.
Waiting to verify that this problem is actually fixed with this patch, before closing.
Hi @ViralBShah ,
Could you create a pull request to OpenBLAS, too? Thank you
Xianyi
Will do.
Can this be closed now that this is patched?
@ViralBShah bump. I'm moving this to 0.3, since the remaining issue (having you make a pull request to OpenBLAS), doesn't seem to be necessary for 0.2 release
I've gone ahead and patched the openblas packages on my PPA with this, and everything seems to be working.
On Wed, Aug 21, 2013 at 12:06 AM, Jameson Nash notifications@github.comwrote:
@ViralBShah https://github.com/ViralBShah bump. I'm moving this to 0.3, since the remaining issue (having you make a pull request to OpenBLAS), doesn't seem to be necessary for 0.2 release
— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/2340#issuecomment-22994272 .
Openblas develop branch includes this patch now.
I'm now progressing from #2267 writing my fourth lapack call, and I need to compute the singular values of a broken arrow matrix efficiently
If d and z are vectors of the same size, I want svd( [diagm(d) z]) efficiently
I'm plugging in the little known dlasd4! and I get all the singular values correctly except the largest one, which just returns d[n]
Sorry, if I'm just not getting pointers right or if I need to copy data, or somehow my workspace is too small, but can anyone help?
The problem is likely to be pilot error on my part -- my fault as all these pointers mystify me. But I haven't ruled out julia or lapack yet. If it is lapack (which I doubt) the author of this code is in town tomorrow (Monday 2/18) and i can consult with him.
Bug example
The answer should be .858... but instead pulling the memory from v[end] On occasion it does give the right answer.