Closed yebai closed 5 years ago
Notebook for simple illustration of adaptation issue: https://github.com/xukai92/TuringDemo/blob/master/look-into-adapt.ipynb Helper file for generating LDA data: https://github.com/xukai92/TuringDemo/blob/master/video/lda-gen-data.jl
https://github.com/stan-dev/stan/blob/develop/src/stan/mcmc/var_adaptation.hpp shows that the way Stan uses Phase II window is: only set the computing-on-the-fly pre-cond matrix when setting up a new interval.
Notebook for simple illustration of adaptation issue: https://github.com/xukai92/TuringDemo/blob/master/look-into-adapt.ipynb Helper file for generating LDA data: https://github.com/xukai92/TuringDemo/blob/master/video/lda-gen-data.jl
@xukai92 Let's try to isolate issues of adaption of step size and covariance and solve them one at a time. I suggest we first try to disable adaption of step-size (e.g. use a small enough step-size) and make sure covariance adaption is robust through using the Welford trick (see https://github.com/yebai/Turing.jl/issues/289#issuecomment-368570037); Then, we come back to see why step-size is adaption is fragile.
I see. that makes sense.
Note: there is a related issue in DiffEqBayes.jl (https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/30).
@xukai92 Any update on this issue?
https://github.com/TuringLang/Turing.jl/issues/324#issuecomment-370272466 works on my local.
@yebai I tried the notebook with master branch again on my local and a remote linux machine. It seems that the adaptation is working now as long as the initialization is fine, i.e. the sampling only keeps throwing numerical error if the initialization is bad. Do you mind try it again on your local to see if you agree on this?
The downstream package DiffEqBayes.jl had a test relying on Turing.jl which suffered from adaptation issues before also passed with the current master (related issue: https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/30, related PR: https://github.com/JuliaDiffEq/DiffEqBayes.jl/pull/48)
Could you describe what changed to fix it? We thought the issue was related to parameters going negative when they were supposed to be non-negative, but didn't have a nice way to do domain transforms (are these going to be added in Turing?)
@yebai I tried the notebook with master branch again on my local and a remote linux machine. It seems that the adaptation is working now as long as the initialization is fine, i.e. the sampling only keeps throwing numerical error if the initialization is bad. Do you mind try it again on your local to see if you agree on this?
@xukai92 thanks, I will do another test and come back with my findings.
Ps. can you clarify/give an example what do you mean by bad initializations?
@yebai What I observed is that: 1) if there are numerical errors during the sampling, our adaptation can fix it (i.e. the numerical error disappears after few iterations, which was not the case before); 2) if there is a numerical error in the beginning, the sampling keeps throwing numerical errors.
My current understanding is: basically theta
is initialized in a place where after invlink
the model throws numerical error when evaluating log-joint or gradient. If this happens, we cannot "rewind" theta
to a place which is still numerically OK because the initial state is numerically flawed. I haven't intensively looked into it, but I guess we might need some mechanism to resample the initial state if this is really what leads to the problem.
@ChrisRackauckas Turing.jl always has domain transforms. I didn't really change the functionality of Turing.jl in that PR but refactoring the core code in a way to ensure no unexpected side-effects happening, which I now believe was the reason why in-sampling numerical error was not correctly handled (either in rejection or adaptation). As I post in the comment above, there is still an issue on initialization, which is especially critical when the domain is very constrained. I think this is still a problem in the model in https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/30 as I see there is a latter Travis job fails on DiffEqBayes.
DynamicHMC.jl has a good adaptation design: https://github.com/tpapp/DynamicHMC.jl/blob/master/src/sampler.jl
DymanicHMC
is a very well designed and tested NUTS
implementation together with adaption for preconditioning matrix. We can try to plug DynamicHMC
into Turing and compare its results against our NUTS
sampler. We can also try to refactor our NUTS
sampler following DynamicHMC
's design. Ideally, the sampler code should be testable and benchmarkable without dependency on other parts of Turing.
This also echoes the discussion in #456.
cc @willtebbutt @wesselb @mohamed82008
This is a umbrella issue for adaption issues of the NUTS algorithm.
Before these three