xUNCSD2BY1 does not handle submatrices with empty rows properly

christoph-conrads commented 3 years ago

Description

The 2x1 CS decomposition decomposes a matrix Q = [Q1; Q2], where Q^* Q = I. There are two possibilities for the CS decomposition UΣV = Q1 in the special case where Q2 has no rows:

U = Q1, Σ = I, V = I,
U = I, Σ = I, V = Q1.

That is, the only thing to do by the algorithm is copying the input matrix into U or V and set the other matrix to the identity matrix.

If

one of the matrices Q1 or Q2 has no rows and
the other matrix is square,

then SORCSD2BY1 performs out-of-bounds array accesses. I assume the same behavior occurs also for DORCSD2BY1, CUNCSD2BY1, and ZUNCSD2BY1. My expectation is that xORCSD2BY1 does not perform out-of-bounds accesses and returns one of the two possible decompositions shown above.

      PROGRAM testCSD2BY1
      IMPLICIT NONE

      INTEGER            INFO, LDA, LDU1, LDU2, LDV, M, N, P, LWORK, I
      PARAMETER        ( LDA = 1, LDU1 = 5, LDU2 = 5, LDV = 5,
     $                   M = 0, P = 1, N = 1, LWORK = 1024)

      INTEGER            IWORK(M + N + P)
      REAL               THETA( N ), A( LDA, N ),
     $                   U1( LDU1, M ), U2( LDU2, P ), V( LDV, N ),
     $                   WORK( LWORK )

      EXTERNAL           SGGQRCS

      A = 0.0
      DO I = 1, MIN( P, N )
         A( I, I ) = 1.0E0
      END DO

      CALL SORCSD2BY1( 'Y', 'Y', 'Y', M + P, M, N, 
     $                 A( 1, 1 ), LDA, A( M + 1, 1 ), LDA,
     $                 THETA,
     $                 U1, LDU1, U2, LDU2, V, LDV,
     $                 WORK, LWORK, IWORK, INFO )
      END PROGRAM testCSD2BY1

Output:

$ make
gfortran  -frecursive -fcheck=all -Wextra -Wall -pedantic bug-csd2by1.f -llapack -L/tmp/build.lapack/lib
$ env LD_LIBRARY_PATH=/tmp/build.lapack/lib ./a.out
At line 318 of file /home/christoph/lapack/SRC/sorbdb2.f
Fortran runtime error: Index '2' of dimension 1 of array 'x21' above upper bound of 1

Error termination. Backtrace:
#0  0x7fb2529d0c3b in sorbdb2_
    at /home/christoph/lapack/SRC/sorbdb2.f:318
#1  0x7fb2529dd2ce in sorcsd2by1_
    at /home/christoph/lapack/SRC/sorcsd2by1.f:549
#2  0x4009cd in ???
#3  0x400a0b in ???
#4  0x7fb251655bf6 in ???
#5  0x400789 in ???
#6  0xffffffffffffffff in ???

Checklist

[x] I've included a minimal example to reproduce the issue
[x] I'd be willing to make a PR to solve this issue

christoph-conrads commented 3 years ago

Fix

For simultaneous bidiagonalization of the input matrices, xORCSD2BY1 calls xORBDB2 and xORBDB3 and the out-of-bounds array accesses occur within these bidiagonalization subroutines. According the xORBDB{2,3} documentation, the input causing the misbehavior is allowed. Consequently, a fix must modify xORBDB{2,3}.

I see two possibilities for fixing this issue:

xORCSD2BY1 checks for the special case where one submatrix has no rows and the other submatrix is square, xORBDB{2,3} are updated (documentation and code) to disallow such input;
xORBDB{2,3} are fixed to handle any valid input.

I am in favor of the latter approach because to the best of my knowledge, all LAPACK routines can handle input of dimension zero (whether this is along one axis or several axes). Hence making xORBDB{2,3} reject certain inputs of dimension zero would violate the principle of least astonishment.

christoph-conrads commented 3 years ago

I see two possibilities for fixing this issue:
* xORCSD2BY1 checks for the special case where one submatrix has no rows and the other submatrix is square, xORBDB{2,3} are updated (documentation and code) to disallow such input;
* xORBDB{2,3} are fixed to handle any valid input.
I am in favor of the latter approach because to the best of my knowledge, all LAPACK routines can handle input of dimension zero (whether this is along one axis or several axes). Hence making xORBDB{2,3} reject certain inputs of dimension zero would violate the principle of least astonishment.

Consider the constraints on the input dimensions, see sorbdb2.f starting at line 64:

m, n, p integers
m ≥ 0 is the number of rows of [X1; X2]
q is the number of columns of [X1; X2] subject to 0 ≤ q ≤ m
p is the number of rows of X1 subject to 0 ≤ p ≤ min(m - p, q, m - q).

([X1; X2] is Matlab-like notation and means the rows of X1 on top of the rows of X2 and X1, X2 having the same number of columns.)

Have a look at the loop with the out-of-bounds access: https://github.com/Reference-LAPACK/lapack/blob/2dafa3d2756a7825c23a8c8456781561e36668ae/SRC/sorbdb2.f#L317-L322

If M = Q and I = Q in the last iteration, then X21(I+1,I) = X21(M + 1, 1) in the call to SLARFGP:

          CALL SLARFGP( 1, X21(M, M), X21(M+1, M), 1, TAUP2(M) )

SLARFGP then compute in line 144 the Euclidean norm of the vector of dimension zero starting at X21(M + 1, M). snrm2 correctly infers the norm to be zero (line 125) without referencing X21(M + 1, 1). Based on the norm, slarfgp will then continue to compute the elementary reflector without referencing X21(M + 1, 1) in lines 144 to 164:

https://github.com/Reference-LAPACK/lapack/blob/2dafa3d2756a7825c23a8c8456781561e36668ae/SRC/slarfgp.f#L144-L164

@weslleyspereira Like I wrote #406, this is another -fcheck=all false positive.

edit link to my comment on May 6, 2021

weslleyspereira commented 3 years ago

Hi @christoph-conrads. Thanks for the complete description! This is very helpful!

I understood this is a false positive, and that SLARFGP doesn't use X21(I+1,I) when N=1. Thanks! So, in practice, nothing unexpected happens when we turn off the flag. (all the tests pass when we turn off -fcheck=all). Equally, SLARF will not use X21(I,I+1), since tau=0. However, I am in favor of continue using some flags, like -fcheck=all, so as to avoid true out-of-bounds (and other) errors.

I propose the following change. Replace:

https://github.com/Reference-LAPACK/lapack/blob/2dafa3d2756a7825c23a8c8456781561e36668ae/SRC/sorbdb2.f#L317-L322

by:

      DO I = P + 1, MIN( Q, M - P - 1 )
         CALL SLARFGP( M-P-I+1, X21(I,I), X21(I+1,I), 1, TAUP2(I) )
         X21(I,I) = ONE
         CALL SLARF( 'L', M-P-I+1, Q-I, X21(I,I), 1, TAUP2(I),
     $               X21(I,I+1), LDX21, WORK(ILARF) )
      END DO
*     
      DO I = M - P, Q
         TAUP2(I) = ZERO
         X21(I,I) = ONE
      END DO

With this change, all tests pass with -fcheck=all.

I can submit a PR with the fix if everything is ok.

christoph-conrads commented 3 years ago

I propose the following change. [snip]

      DO I = P + 1, MIN( Q, M - P - 1 )
         CALL SLARFGP( M-P-I+1, X21(I,I), X21(I+1,I), 1, TAUP2(I) )
         X21(I,I) = ONE
         CALL SLARF( 'L', M-P-I+1, Q-I, X21(I,I), 1, TAUP2(I),
     $               X21(I,I+1), LDX21, WORK(ILARF) )
      END DO
*     
      DO I = M - P, Q
         TAUP2(I) = ZERO
         X21(I,I) = ONE
      END DO

The do-loop at the end assumes that X21(I, I) is nonnegative for otherwise must TAUP2(I, I) = 2, cf. SRC/slarfgp.f, line 146+. I do not see why this should be the case.
The same fix must be applied to SORBDB3, i.e., for the case where X11 is square and X21 has no rows.

christoph-conrads commented 3 years ago

The loop starting in SRC/sorbdb2.f, line 279, will produce out-of-bounds accesses because of the references X11(I+1, I), X11(I, I+1) (if P = Q), and X11(I+1, I+1).
Case 3) applies to SORBDB2 with M - P = Q.

https://github.com/Reference-LAPACK/lapack/blob/2dafa3d2756a7825c23a8c8456781561e36668ae/SRC/sorbdb2.f#L279-L313

weslleyspereira commented 3 years ago

Thanks, @christoph-conrads!

I agree with (1). I will prepare a PR including your correction for negative X21(I, I). I will also look at SORBDB2 and SORBDB3, thanks.

Reference-LAPACK / lapack

xUNCSD2BY1 does not handle submatrices with empty rows properly #549

Fix