When using a split integrator, caching the gradient only saves the cost of the outermost gradient evaluation. So, in case "inner" updates are more expensive than the outermost one, then not caching the gradient won't incur much performance penalty. That said, it's probably worth getting this right at some point.
There is probably wasteful re-computation of gradients in the current NUTS implementation. During successive leapfrog (or other numerical integration) steps, we can partially re-use the gradient from the previous step. More precisely, this is where a cached gradient could be re-used: https://github.com/beast-dev/beast-mcmc/blob/hmc_develop/src/dr/inference/operators/hmc/NoUTurnOperator.java#L130
When using a split integrator, caching the gradient only saves the cost of the outermost gradient evaluation. So, in case "inner" updates are more expensive than the outermost one, then not caching the gradient won't incur much performance penalty. That said, it's probably worth getting this right at some point.
One solution is to embed a flag
gradientKnown
withingradientProvider
as @xji3 have suggested. I'd imagine this is somewhat error prone, though, and there is a simpler and more explicit solution for NUTS. I have personally handled this in my Python implementation as follows: 1) make the gradient (for the outer-most update) an argument to the integrator so that I can supply either the cached or newly evaluated one as needed, https://github.com/suchard-group/bayes-variable-selection/blob/hmc/bayesbridge/reg_coef_sampler/hamiltonian_monte_carlo/dynamics.py#L42 2) manually cache the gradient at the "front" and "rear" ends of the NUTS trajectory tree, https://github.com/aki-nishimura/bayes-bridge/blob/master/bayesbridge/reg_coef_sampler/hamiltonian_monte_carlo/nuts.py#L206 3) pass the cached gradient when doubling the trees: https://github.com/suchard-group/bayes-variable-selection/blob/hmc/bayesbridge/reg_coef_sampler/hamiltonian_monte_carlo/nuts.py#L231