LTLA / BiocSingular

Clone of the Bioconductor repository for the BiocSingular package.
https://bioconductor.org/packages/devel/bioc/html/BiocSingular.html
7 stars 1 forks source link

runSVD with RandomParam() returns inverted values #10

Closed pablo-rodr-bio2 closed 3 years ago

pablo-rodr-bio2 commented 3 years ago

I was trying to use runSVD() with RandomParam() on a very large dataset in a HDF5Array. Before that, I did some tests to see how could values change between this and base::svd(), but it turns out everytime I use RandomParam() I get results on the first column of $u and $v with its values inverted, don't know if this is intended.

> library(BiocSingular)
> set.seed(123)
> m <- matrix(sample.int(10, 25, T), 10, 10)
> m
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    3    5    9    5    3    3    5    9    5     3
 [2,]    3    3    3    4    8    3    3    3    4     8
 [3,]   10    9    4    6   10   10    9    4    6    10
 [4,]    2    9    1    9    7    2    9    1    9     7
 [5,]    6    9    7   10   10    6    9    7   10    10
 [6,]    5    3    3    5    9    5    3    3    5     9
 [7,]    4    8    3    3    3    4    8    3    3     3
 [8,]    6   10   10    9    4    6   10   10    9     4
 [9,]    9    7    2    9    1    9    7    2    9     1
[10,]   10   10    6    9    7   10   10    6    9     7
> gSetIdx <- 1:2

> x1 <- svd(m[gSetIdx, ])
> x2 <- runSVD(m[gSetIdx, ], k=2)
> x3 <- runSVD(m[gSetIdx, ], k=2, BSPARAM=RandomParam())

This are the results I get:

> x1
$d
[1] 21.227029  7.836661

$u
           [,1]       [,2]
[1,] -0.7796929 -0.6261621
[2,] -0.6261621  0.7796929

$v
            [,1]         [,2]
 [1,] -0.1986884  0.058774061
 [2,] -0.2721507 -0.101029225
 [3,] -0.4190753 -0.420635796
 [4,] -0.3016490 -0.001536228
 [5,] -0.3461801  0.556239043
 [6,] -0.1986884  0.058774061
 [7,] -0.2721507 -0.101029225
 [8,] -0.4190753 -0.420635796
 [9,] -0.3016490 -0.001536228
[10,] -0.3461801  0.556239043

> x2
$d
[1] 21.227029  7.836661

$u
           [,1]       [,2]
[1,] -0.7796929 -0.6261621
[2,] -0.6261621  0.7796929

$v
            [,1]         [,2]
 [1,] -0.1986884  0.058774061
 [2,] -0.2721507 -0.101029225
 [3,] -0.4190753 -0.420635796
 [4,] -0.3016490 -0.001536228
 [5,] -0.3461801  0.556239043
 [6,] -0.1986884  0.058774061
 [7,] -0.2721507 -0.101029225
 [8,] -0.4190753 -0.420635796
 [9,] -0.3016490 -0.001536228
[10,] -0.3461801  0.556239043

> x3
$d
[1] 21.227029  7.836661

$u
          [,1]       [,2]
[1,] 0.7796929 -0.6261621
[2,] 0.6261621  0.7796929

$v
           [,1]         [,2]
 [1,] 0.1986884  0.058774061
 [2,] 0.2721507 -0.101029225
 [3,] 0.4190753 -0.420635796
 [4,] 0.3016490 -0.001536228
 [5,] 0.3461801  0.556239043
 [6,] 0.1986884  0.058774061
 [7,] 0.2721507 -0.101029225
 [8,] 0.4190753 -0.420635796
 [9,] 0.3016490 -0.001536228
[10,] 0.3461801  0.556239043

The values of x3$u[,1] and x3$v[1,] are inverted.

> sessionInfo()
R Under development (unstable) (2020-10-29 r79387)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/bort/R-devel/lib/libRblas.so
LAPACK: /home/bort/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocSingular_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           rsvd_1.0.3           lattice_0.20-41     
 [4] matrixStats_0.57.0   IRanges_2.25.6       grid_4.1.0          
 [7] stats4_4.1.0         irlba_2.3.3          S4Vectors_0.29.6    
[10] Matrix_1.3-0         BiocParallel_1.25.2  beachmat_2.7.5      
[13] DelayedArray_0.17.7  MatrixGenerics_1.3.0 parallel_4.1.0      
[16] compiler_4.1.0       BiocGenerics_0.37.0
LTLA commented 3 years ago

This is not a problem; the sign of the singular vectors is not identifiable. If we were to reconstruct the matrix from the decomposition, you would see that you get the same result as the negatives cancel out:

library(BiocSingular)
set.seed(123)
m <- matrix(sample.int(10, 25, T), 10, 10)

x2 <- runSVD(m[1:2, ], k=2)
x3 <- runSVD(m[1:2, ], k=2, BSPARAM=RandomParam())

# Both these things give me the same result:
x2$u %*% diag(x2$d) %*% t(x2$v)
x3$u %*% diag(x3$d) %*% t(x3$v)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    3    5    9    5    3    3    5    9    5     3
## [2,]    3    3    3    4    8    3    3    3    4     8
pablo-rodr-bio2 commented 3 years ago

Oh, I see, sorry for the issue then, closing it