Closed ssydasheng closed 6 years ago
The primary reason is that a KroneckerLazyVariable
can represent the Kronecker product of other LazyVariables
, which we might not want to evaluate or invert explicitly. This is true, for example, when doing KISS-GP with Kronecker structure on multiple dimensions (https://github.com/cornellius-gp/gpytorch/blob/master/examples/kissgp_kronecker_product_regression.ipynb).
It's true that if KroneckerLazyVariable
were over only NonLazyVariables
, we could probably avoid CG and Lanczos.
Do you have an example use case where exploiting properties of the Kronecker product other than for MVMs would be useful? We just briefly discussed it and are open to the idea, we just can't think of what we'd gain, or how we'd keep the structure of the sub lazy variables intact.
It may also just generally be hard to justify using inversion algorithms other than CG at this point, even when they are available. CG typically gives fairly good solves with a very small number of iterations, making it a compelling choice even when an exact inverse is possible. For example, even for exact GPs, k iterations of CG requires O(kn^2) time compared to O(n^3) time to compute a Cholesky decomposition.
Even without preconditioning, k can often be very small, and often depends more on the conditioning of the matrix and the clustering of its eigenvalues rather than on the size of the matrix.
I think the primary gain over MVM is the speed. I tried a 10 dimensional grid, with 10 points in each dim. Root Decomposition is very very slow. I guess although MVM is fast with kronecker structure, the memory cost is still big with a multidimensional grid, which slows down the computation a lot. However, with kronecker root decomposition applied, this should finish instantly.
Maybe I misunderstood the code, why there is a problem when lazy_vars are LazyVariables
? For KroneckerProductLazyVariable
, InverseLazyVariable
or SquareLazyVariable
should be able to implemented to be applied to each element. At least CG and Lanczos should be able to applied elementwise, right?
Do you have an example of how a new KroneckerLazyVariable
would look? It would basically need to define a new inv_quad_log_det
.
The main cost with Kronecker structure is the exponential dependence on d of the regularly spaced grid; in 10 dimensions at 10 grid points per dimension, that's 10^10 inducing points. At the moment, the two implemented ways around this are the multiplicative grid interpolation kernel which exploits product structure (e.g. in the RBF kernel) and deep kernel learning, although we may implement SGPR as a lazy variable down the road.
It's not immediately obvious to me personally how we can get around the exponential scaling in a Kronecker lazy variable; however, if you believe you have a method, it could be extremely interesting.
Ah, I missed out Multiplicative grid interpolation kernel
before. Thank you.
For
KroneckerLazyVariable
, which you can easily get exact inverse and root, why you still go for conjugate gradient & lanczos ? These seems be slower to me cuz the true size of matrix is really big, by small steps of iterations, it might be far from a good solution.