chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
54 stars 28 forks source link

[WIP] replace solve( with chol2inv(chol for hessian -> covariance matrix #84

Closed kkmann closed 3 years ago

kkmann commented 3 years ago

most of this is automatic whitespace removal, the magic happens in lines 500 and 608 by exploiting the symmetry of the hessian to invert via cholesky decomposition. Should scale much better than solve().

kkmann commented 3 years ago

I think for large scale problems the main bottleneck is the computtion of the hessian itself (before inversion), can that be sped up?

edit: maybe using numDeriv::hessian is quicker, or at least allows to specify greater tolerances to get a Hessian quicker?

edit: I can confirm, it's the Hessian computation in optim that takes so long - are their quicker options?

chjackson commented 3 years ago

That's nice, thank you. This will benefit all of my packages that do MLE!

kkmann commented 3 years ago

@chjackson I am afraid it might introduce new problems though. chol() complains when the Hessian is not (close to) positive definite. We should probably throw a warning there since it indicates convergence to a non-minimum. Not sure how to fix this ideally, but Matrix::nearPD could be a consideration (i.e. use solve on the non-PD Hessian, check smallest eigenvalue, issue warning if smaller than defined tolerance, and return Matrix::nearPD of the inverse as covariance Matrix).

The key problem however is the efficient computation of the Hessian. This problem only really becomes relevant with flexsurvmix since the direct solution method optimizes all parameters at the same time (thus requiring a large Hessian). Hessian is at least O(k^2) for k being the number of parameters in the model. I guess the likelihoods are written as R functions, potentially with loops, so the evaluation of a likelihood with many observations becomes really expensive. Combined with the O(k^2) this leads to the optimization itself still being feasible but the Hessian computation actually taking longer than the entire optimization...