bwlewis / irlba

Fast truncated singular value decompositions
127 stars 17 forks source link

irlba with warm start returns incorrect decomposition #58

Open GabrielHoffman opened 3 years ago

GabrielHoffman commented 3 years ago

I have an interative algorithm that: 1) performs SVD on a matrix, 2) slightly modifies the original matrix, and 3) performs SVD on the new matrix, 4) iterates until 1,2,3 until convergence

Since the modification of the matrix at each iteration is pretty small, I expect the changes to the U,D, V matrices of the SVD to be pretty small. Therefore, I want to use the warm start feature of irlba where I can use the previous SVD as a starting point so that irlba will converge faster. However, when using the warm start approach, it seems that irlba just returns the warm start matrix and gives an incorrect decomposition.

Here is an example:

library(irlba)
#> Loading required package: Matrix

# simulate a simple dataset
set.seed(1)
n = 100
p = 1000
X = matrix(rnorm(n*p), n,p)

# decomposition of original matrix
decomp.init = irlba(X)

# decomposing of related matrix
###########################

# irlba
decmp.irlba = irlba(X * 10)

# svd
decomp.svd = svd(X * 10)

# the singular values are the same by both methods
head(decmp.irlba$d)
#> [1] 414.4095 407.4580 405.0538 401.7232 398.5208
head(decomp.svd$d)
#> [1] 414.4095 407.4580 405.0538 401.7232 398.5208 396.5865

# Since I expect the decompostion of the second matrix (i.e. X * 10)
# to be very similar to to first (i.e. X), I want to use
# the decomposition of the first matrix as a warm start for the decomposition
# of the second

# irlba using warm start
decomp.restart = irlba(X * 10, v=decomp.init)

# However, this does not compute the correct singular values
head(decomp.restart$d)
#> [1] 41.44095 40.74580 40.50538 40.17232 39.85208

# In fact it just returns the warm start value without doing any calculations
head(decomp.init$d)
#> [1] 41.44095 40.74580 40.50538 40.17232 39.85208

The cause seems to be this line.

sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin19.6.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] irlba_2.3.3 Matrix_1.3-2 RUnit_0.4.32 Rfast_2.0.1 [5] RcppZiggurat_0.1.6 Rcpp_1.0.6 decorrelate_0.0.2 loaded via a namespace (and not attached): [1] jquerylib_0.1.3 bslib_0.2.4 pillar_1.5.0 compiler_4.0.3 [5] highr_0.8 tools_4.0.3 digest_0.6.27 jsonlite_1.7.2 [9] evaluate_0.14 lifecycle_1.0.0 tibble_3.1.0 lattice_0.20-41 [13] pkgconfig_2.0.3 rlang_0.4.10 reprex_1.0.0 cli_2.3.1 [17] rstudioapi_0.13 yaml_2.2.1 parallel_4.0.3 xfun_0.21 [21] knitr_1.31 sass_0.3.1 fs_1.5.0 vctrs_0.3.6 [25] grid_4.0.3 glue_1.4.2 R6_2.5.0 processx_3.4.5 [29] fansi_0.4.2 rmarkdown_2.7 callr_3.5.1 clipr_0.7.1 [33] magrittr_2.0.1 ps_1.6.0 ellipsis_0.3.1 htmltools_0.5.1.1 [37] assertthat_0.2.1 utf8_1.1.4 crayon_1.4.1
bwlewis commented 3 years ago

Thanks for this report. I've had some offline discussion with Jim Baglama about improving the restarting code more generally as well.

In the short term though I'll try to fix this asap.