Wasteful re-computation of gradients within NUTS operator

aki-nishimura commented 4 years ago

There is probably wasteful re-computation of gradients in the current NUTS implementation. During successive leapfrog (or other numerical integration) steps, we can partially re-use the gradient from the previous step. More precisely, this is where a cached gradient could be re-used: https://github.com/beast-dev/beast-mcmc/blob/hmc_develop/src/dr/inference/operators/hmc/NoUTurnOperator.java#L130

When using a split integrator, caching the gradient only saves the cost of the outermost gradient evaluation. So, in case "inner" updates are more expensive than the outermost one, then not caching the gradient won't incur much performance penalty. That said, it's probably worth getting this right at some point.

One solution is to embed a flag gradientKnown within gradientProvider as @xji3 have suggested. I'd imagine this is somewhat error prone, though, and there is a simpler and more explicit solution for NUTS. I have personally handled this in my Python implementation as follows: 1) make the gradient (for the outer-most update) an argument to the integrator so that I can supply either the cached or newly evaluated one as needed, https://github.com/suchard-group/bayes-variable-selection/blob/hmc/bayesbridge/reg_coef_sampler/hamiltonian_monte_carlo/dynamics.py#L42 2) manually cache the gradient at the "front" and "rear" ends of the NUTS trajectory tree, https://github.com/aki-nishimura/bayes-bridge/blob/master/bayesbridge/reg_coef_sampler/hamiltonian_monte_carlo/nuts.py#L206 3) pass the cached gradient when doubling the trees: https://github.com/suchard-group/bayes-variable-selection/blob/hmc/bayesbridge/reg_coef_sampler/hamiltonian_monte_carlo/nuts.py#L231

msuchard commented 1 year ago

Has this been addressed? @Zhenyu028 @aki-nishimura

aki-nishimura commented 1 year ago

Seems so: https://github.com/beast-dev/beast-mcmc/commit/bdbf60e503021d72471f177db4516a47ad9ca882

(Credit: git blame)

beast-dev / beast-mcmc

Wasteful re-computation of gradients within NUTS operator #1096