InvalidArgumentError: Input matrix is not invertible.

JSP21 commented 5 years ago

Hi all, I'm facing the foll. issue while executing Deep Gaussian Process SVI for a two-layer model.

I have tried adding jitter, centered the input data, tried various hyperparameter specifications, upgrading gpflow version, but couldn't resolve the error.

Any pointers, please! Thank you!

InvalidArgumentError (see above for traceback): Input matrix is not invertible. [[Node: gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/MatrixTriangularSolve = MatrixTriangularSolve[T=DT_FLOAT, adjoint=false, lower=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](DGP-2c82c62a-25/conditional/base_conditional/Cholesky, gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/eye/MatrixDiag)]]

The full error trace is as follows:

File "/home/jaya/jayashree/cdgp_experiments/wconv_rbf.py", line 112, in m_dgp2 = make_dgp(2) File "/home/jaya/jayashree/cdgp_experiments/wconv_rbf.py", line 103, in make_dgp num_outputs=num_classes) File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/core/compilable.py", line 90, in init self.build() File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/core/node.py", line 156, in build self._build() File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/models/model.py", line 81, in _build likelihood = self._build_likelihood() File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/decors.py", line 67, in tensor_mode_wrapper result = method(obj, *args, kwargs) File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 106, in _build_likelihood L = tf.reduce_sum(self.E_log_p_Y(self.X, self.Y)) File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 95, in E_log_p_Y Fmean, Fvar = self._build_predict(X, full_cov=False, S=self.num_samples) File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/decors.py", line 67, in tensor_mode_wrapper result = method(obj, *args, *kwargs) File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 87, in _build_predict Fs, Fmeans, Fvars = self.propagate(X, full_cov=full_cov, S=S) File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/decors.py", line 67, in tensor_mode_wrapper result = method(obj, args, kwargs) File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 76, in propagate F, Fmean, Fvar = layer.sample_from_conditional(F, z=z, full_cov=full_cov) File "/home/jaya/jayashree/cdgp_experiments/layers.py", line 111, in sample_from_conditional mean, var = self.conditional(X, full_cov=full_cov) File "/home/jaya/jayashree/cdgp_experiments/layers.py", line 96, in conditional mean, var = single_sample_conditional(X_flat) File "/home/jaya/jayashree/cdgp_experiments/layers.py", line 84, in single_sample_conditional full_cov=full_cov, white=True)

InvalidArgumentError (see above for traceback): Input matrix is not invertible. [[Node: gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/MatrixTriangularSolve = MatrixTriangularSolve[T=DT_FLOAT, adjoint=false, lower=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](DGP-2c82c62a-25/conditional/base_conditional/Cholesky, gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/eye/MatrixDiag)]]

hughsalimbeni commented 5 years ago

The most likely thing is grossly misspecified hyperparameters. What's the data you're using? I generally rescale the data to unit standard deviation to avoid having to set them by hand. Another issue could be a nan in the data. Also, are you using float64 or float32?

JSP21 commented 5 years ago

I have rescaled the data to unit SD and the data type is float32. And I have verified that there are no NaNs in the data.

hughsalimbeni commented 5 years ago

Could you try with tf.float64 (in gpflowrc). Sometimes that is a cause of instability.

hughsalimbeni commented 5 years ago

(and I'm assuming jitter is 1e-6)

JSP21 commented 5 years ago

Thank you so much. It works!

JSP21 commented 5 years ago

Also, please could you kindly tell the reasons why the variational parameters q_mu and q_sqrt could possibly turn to nan on increasing the layers?

hughsalimbeni commented 5 years ago

When using the natural gradient optimizer the actual gradient step takes place in the natural parameters, which are unconstrained. That is, not all values for the natural parameters are valid (because of positive definiteness). Sometimes gradients steps are too large and the gradient step moves to values that are invalid, resulting in a nan update to q_sqrt. It is actually possible to take natural gradient steps in other parameterization, but in practice it doesn't seem to work so well. See this paper for details.

JSP21 commented 5 years ago

Thank you. It makes sense.

Also, I am also facing a scenario wherein variational parameters are getting updated during the learning process, but not the kernel parameters. I wrote a new kernel by initialising the kernel parameters using gpflow.params.Parameter( ). Any pointers please to make the kernel parameters get updated while optimising?

hughsalimbeni commented 5 years ago

If you're optimizing hyperparameters then you need an additional optimizer. I tend to alternate between nat grad steps and adam steps. See, for example: https://github.com/hughsalimbeni/DGPs_with_IWVI/blob/3f6fab39586f9e45dbc26c6dec91394f9b052e9e/experiments/build_models.py#L293

JSP21 commented 5 years ago

Thank you. And, what is the intuition behind going for two optimizers? Will adam alone not suffice for learning both?

hughsalimbeni commented 5 years ago

Yes, and that is indeed what I used to do. This paper https://arxiv.org/abs/1905.03350 looks at this issue in more detail.

UCL-SML / Doubly-Stochastic-DGP

InvalidArgumentError: Input matrix is not invertible. #39