derekbeaton / GSVD

19 stars 6 forks source link

error in tolerance_svd when nu = 0 or nv = 0 #34

Open LukeMoraglia opened 1 year ago

LukeMoraglia commented 1 year ago

Hey Derek,

When calling tolerance_svd() with nu = 0 or nv = 0, there are errors since svd_res$u and svd_res$v don't exist.

I'm not immediately sure the cleanest way to check for this and adjust the function. I'm happy to help if you want me to take a stab at it, but maybe you have a good idea of how to fix it.

Thanks, Luke

small example

library(GSVD)
set.seed(42)
X <- matrix(sample.int(20, 20*3, replace = TRUE), 20, 3)
tolerance_svd(X, nu = 0, nv = 0)
LukeMoraglia commented 8 months ago

Hey Derek,

Following up on our conversation, I want to provide a possible solution to this issue where tolerance_svd() behaves like svd() when nu = 0 and nv = 0; that is, it only returns a list containing d. I've placed my solution in a new branch on my copy of the repo: https://github.com/LukeMoraglia/GSVD/blob/nu%3Dnv%3D0/R/tolerance_svd.R. The example code above now runs without error.

The hardest part about editing this function was accounting for the vector_signs variable, since it was written to depend only on svd_res$v. You could see a situation where nu = 3 say and nv = 0. In this case, I figured a sensible default is to not change the signs of svd_res$u, but I could see an argument for another route.

This brought to mind some other situations. What happens if nu and nv are different values? In the current version of tolerance_svd(): strange or bad behavior.

For example, using the same X matrix above:

svd(X, nu = 1, nv = 1)
$d
[1] 77.27411 27.56863 23.76304

$u
             [,1]
 [1,] -0.30191125
 [2,] -0.22574600
 [3,] -0.06502495
 [4,] -0.15257340
 [5,] -0.19606927
 [6,] -0.17495810
 [7,] -0.22379188
 [8,] -0.24306248
 [9,] -0.23485492
[10,] -0.14637988
[11,] -0.10471669
[12,] -0.14954445
[13,] -0.41178448
[14,] -0.28882945
[15,] -0.18612680
[16,] -0.14529716
[17,] -0.15857294
[18,] -0.22468851
[19,] -0.14734905
[20,] -0.36900861

$v
           [,1]
[1,] -0.6289919
[2,] -0.5502005
[3,] -0.5492255
tolerance_svd(X, nu = 1, nv = 1)
$d
[1] 77.27411 27.56863 23.76304

$u
            [,1]
 [1,] 0.30191125
 [2,] 0.22574600
 [3,] 0.06502495
 [4,] 0.15257340
 [5,] 0.19606927
 [6,] 0.17495810
 [7,] 0.22379188
 [8,] 0.24306248
 [9,] 0.23485492
[10,] 0.14637988
[11,] 0.10471669
[12,] 0.14954445
[13,] 0.41178448
[14,] 0.28882945
[15,] 0.18612680
[16,] 0.14529716
[17,] 0.15857294
[18,] 0.22468851
[19,] 0.14734905
[20,] 0.36900861

$v
          [,1]
[1,] 0.6289919
[2,] 0.5502005
[3,] 0.5492255

In the above, the signs of the first singular vectors get flipped by vector_signs, and this is fine. But in this situation:

svd(X, nu = 1, nv = 3)
$d
[1] 77.27411 27.56863 23.76304

$u
             [,1]
 [1,] -0.30191125
 [2,] -0.22574600
 [3,] -0.06502495
 [4,] -0.15257340
 [5,] -0.19606927
 [6,] -0.17495810
 [7,] -0.22379188
 [8,] -0.24306248
 [9,] -0.23485492
[10,] -0.14637988
[11,] -0.10471669
[12,] -0.14954445
[13,] -0.41178448
[14,] -0.28882945
[15,] -0.18612680
[16,] -0.14529716
[17,] -0.15857294
[18,] -0.22468851
[19,] -0.14734905
[20,] -0.36900861

$v
           [,1]       [,2]       [,3]
[1,] -0.6289919 -0.7165419 -0.3015573
[2,] -0.5502005  0.1362610  0.8238400
[3,] -0.5492255  0.6841057 -0.4799487
tolerance_svd(X, nu = 1, nv = 3) # current version
$d
[1] 77.27411 27.56863 23.76304

$u
             [,1]
 [1,]  0.30191125
 [2,] -0.22574600
 [3,] -0.06502495
 [4,]  0.15257340
 [5,] -0.19606927
 [6,] -0.17495810
 [7,]  0.22379188
 [8,] -0.24306248
 [9,] -0.23485492
[10,]  0.14637988
[11,] -0.10471669
[12,] -0.14954445
[13,]  0.41178448
[14,] -0.28882945
[15,] -0.18612680
[16,]  0.14529716
[17,] -0.15857294
[18,] -0.22468851
[19,]  0.14734905
[20,] -0.36900861

$v
          [,1]       [,2]       [,3]
[1,] 0.6289919 -0.7165419 -0.3015573
[2,] 0.5502005  0.1362610  0.8238400
[3,] 0.5492255  0.6841057 -0.4799487

Warning message:
In t(svd_res$u) * vector_signs :
  longer object length is not a multiple of shorter object length

In the above case, vector_signs messes up u. I've added some logic that checks that vector_signs is the same length as ncol(svd_res$u), and if not we modify vector_signs to match. See here: https://github.com/LukeMoraglia/GSVD/blob/eea863721d7cbaec19fb936e3d89bf1173048d47/R/tolerance_svd.R#L103-L109

tolerance_svd(X, nu = 1, nv = 3) # new version
$d
[1] 77.27411 27.56863 23.76304

$u
            [,1]
 [1,] 0.30191125
 [2,] 0.22574600
 [3,] 0.06502495
 [4,] 0.15257340
 [5,] 0.19606927
 [6,] 0.17495810
 [7,] 0.22379188
 [8,] 0.24306248
 [9,] 0.23485492
[10,] 0.14637988
[11,] 0.10471669
[12,] 0.14954445
[13,] 0.41178448
[14,] 0.28882945
[15,] 0.18612680
[16,] 0.14529716
[17,] 0.15857294
[18,] 0.22468851
[19,] 0.14734905
[20,] 0.36900861

$v
          [,1]       [,2]       [,3]
[1,] 0.6289919 -0.7165419 -0.3015573
[2,] 0.5502005  0.1362610  0.8238400
[3,] 0.5492255  0.6841057 -0.4799487