Closed ClaudMor closed 2 years ago
I would not expect the initialization to have an impact on the computation time - the same number of calculations have to be performed in both cases. However, I would assume that the step size and the number of accepted steps could/should be affected by a different initialization, which could result in e.g. better mixing. The higher ESS values in the run with a custom initialization support this intuition.
BTW in some sense the improved mixing leads to a relative speed-up - to achieve the same effective sample size with a random initialization you would need a longer chain which of course would take more time to sample.
Thank you very much @devmotion .
I know it's a bit off topic, but would you have any advice on what is the approximate minimum number of model parameters that should induce one to use ADVI instead of NUTS?
Here a 12 is cited, but I'm not sure if it refers to the number of parameters or other.
I don't have any good heuristics here. As far as I can tell, the number 12 in the documentation is completely arbitrary and refers to the number of samples in the Markov chain. The main point there seems to be that the MCMC methods provide exactness guarantees if the number of samples goes to infinity but the number of samples required for a good approximation of the posterior (e.g. in the sense that the estimation of the expectation of some functional is reasonably close to the true value) might be prohibitively large. Therefore sometimes one might prefer an approximate method such as VI over MCMC methods.
Hello,
This question is related to this issue, but since it's slightly different I opened another. If it is wrong please let me know.
Thanks to your PR, now
init_params
seems to work. The problem is that if I setinit_params
to reasonable values, NUTS does not seem to speed up (I'm not even sure if it should).As an example, I will use a model found in the tutorials
Let's first find reasonable parameters values, by running NUTS once:
Next we sample one chain of 10'000:
And It took 29.285691 seconds (226.37 M allocations: 17.156 GiB, 8.58% gc time)
You can plot the chain
If I initialize the same sampling with the parameter values found in this run, i get
And it took
31.805427 seconds (268.14 M allocations: 20.313 GiB, 8.99% gc time)
I observe the same behaviour with more complex underlying DifferentialEquations.jl models ( which require more time to calibrate), but same Turing.jl model .
Should I expect that parameter initialization to actually speed up the process?
Side question: what is the number of model parameters that should induce one to use ADVI instead of NUTS?
Thanks very much for your attention