Open LTLA opened 5 years ago
OK, sorry for the longish response that follows... To answer the question:
The current implementation uses harmonic Ritz values, a kind of inverse estimate really--to look for the largest singular values of the pseudoinverse of A. Performance with clusters of small singular values can be poor.
Lothar Reichel and I and others have discussed this over the winter, and I'm working on a roadmap with some proposed changes and some new algorithms too for a 3.0 release. I hope to have that ready by spring and some prototypes ready for discussion before the http://bugs.unica.it/ETNA25/ conference.
Checking out the uwot package code from https://github.com/jlmelville/uwot/issues/16, I see that uwot is using irlba for a partial symmetric eigenvalue problem, a task at which irlba is not nearly as well-equipped to solve as RSpectra (also used by uwot). Irlba is for singular values, RSpectra for eigenvalues (indeed RSpectra should not really be used for singular values either, see for instance an old write up about this here: https://bwlewis.github.io/irlba/comparison.html). However RSpectra can run into trouble sometimes, so maybe those bug reports are hitting those edge cases.
I can't say I recommend irlba for the partial eigenvalue problem. There IS however, a variation of the algorithm that is designed for the partial symmetric eigenvalue problem, and it will probably work great for this application. See https://www.math.uri.edu/~jbaglama/papers/paper1.pdf for an old paper on this.
SInce that method shares a lot of the core algorithmic details of irlba, it should be relatively straight forward to implement that, replacing the dumb partial_eigen() function in irlba with something much better.
Perhaps I should expedite adding that new algorithm as an early goal for 3.0? I'm willing to work on it.
Thank you for the detailed response, @bwlewis. I would certainly be interested in testing an improved partial_eigen
routine in uwot.
What would it take to make
smallest=TRUE
non-experimental? Is this something that can be fixed by enough programming effort or are there more fundamental theoretical limitations with using irlba for this?This is motivated by jlmelville/uwot#16; tagging in @jlmelville.