TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2k stars 216 forks source link

Robust adaption for NUTS #324

Closed yebai closed 5 years ago

yebai commented 6 years ago

This is a umbrella issue for adaption issues of the NUTS algorithm.

Before these three

xukai92 commented 6 years ago

Notebook for simple illustration of adaptation issue: https://github.com/xukai92/TuringDemo/blob/master/look-into-adapt.ipynb Helper file for generating LDA data: https://github.com/xukai92/TuringDemo/blob/master/video/lda-gen-data.jl

xukai92 commented 6 years ago

https://github.com/stan-dev/stan/blob/develop/src/stan/mcmc/var_adaptation.hpp shows that the way Stan uses Phase II window is: only set the computing-on-the-fly pre-cond matrix when setting up a new interval.

yebai commented 6 years ago

Notebook for simple illustration of adaptation issue: https://github.com/xukai92/TuringDemo/blob/master/look-into-adapt.ipynb Helper file for generating LDA data: https://github.com/xukai92/TuringDemo/blob/master/video/lda-gen-data.jl

@xukai92 Let's try to isolate issues of adaption of step size and covariance and solve them one at a time. I suggest we first try to disable adaption of step-size (e.g. use a small enough step-size) and make sure covariance adaption is robust through using the Welford trick (see https://github.com/yebai/Turing.jl/issues/289#issuecomment-368570037); Then, we come back to see why step-size is adaption is fragile.

xukai92 commented 6 years ago

I see. that makes sense.

xukai92 commented 6 years ago

Note: there is a related issue in DiffEqBayes.jl (https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/30).

yebai commented 6 years ago

@xukai92 Any update on this issue?

xukai92 commented 6 years ago

https://github.com/TuringLang/Turing.jl/issues/324#issuecomment-370272466 works on my local.

xukai92 commented 6 years ago

@yebai I tried the notebook with master branch again on my local and a remote linux machine. It seems that the adaptation is working now as long as the initialization is fine, i.e. the sampling only keeps throwing numerical error if the initialization is bad. Do you mind try it again on your local to see if you agree on this?

xukai92 commented 6 years ago

The downstream package DiffEqBayes.jl had a test relying on Turing.jl which suffered from adaptation issues before also passed with the current master (related issue: https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/30, related PR: https://github.com/JuliaDiffEq/DiffEqBayes.jl/pull/48)

ChrisRackauckas commented 6 years ago

Could you describe what changed to fix it? We thought the issue was related to parameters going negative when they were supposed to be non-negative, but didn't have a nice way to do domain transforms (are these going to be added in Turing?)

yebai commented 6 years ago

@yebai I tried the notebook with master branch again on my local and a remote linux machine. It seems that the adaptation is working now as long as the initialization is fine, i.e. the sampling only keeps throwing numerical error if the initialization is bad. Do you mind try it again on your local to see if you agree on this?

@xukai92 thanks, I will do another test and come back with my findings.

Ps. can you clarify/give an example what do you mean by bad initializations?

xukai92 commented 6 years ago

@yebai What I observed is that: 1) if there are numerical errors during the sampling, our adaptation can fix it (i.e. the numerical error disappears after few iterations, which was not the case before); 2) if there is a numerical error in the beginning, the sampling keeps throwing numerical errors.

My current understanding is: basically theta is initialized in a place where after invlink the model throws numerical error when evaluating log-joint or gradient. If this happens, we cannot "rewind" theta to a place which is still numerically OK because the initial state is numerically flawed. I haven't intensively looked into it, but I guess we might need some mechanism to resample the initial state if this is really what leads to the problem.

xukai92 commented 6 years ago

@ChrisRackauckas Turing.jl always has domain transforms. I didn't really change the functionality of Turing.jl in that PR but refactoring the core code in a way to ensure no unexpected side-effects happening, which I now believe was the reason why in-sampling numerical error was not correctly handled (either in rejection or adaptation). As I post in the comment above, there is still an issue on initialization, which is especially critical when the domain is very constrained. I think this is still a problem in the model in https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/30 as I see there is a latter Travis job fails on DiffEqBayes.

xukai92 commented 5 years ago

DynamicHMC.jl has a good adaptation design: https://github.com/tpapp/DynamicHMC.jl/blob/master/src/sampler.jl

yebai commented 5 years ago

DymanicHMC is a very well designed and tested NUTS implementation together with adaption for preconditioning matrix. We can try to plug DynamicHMC into Turing and compare its results against our NUTS sampler. We can also try to refactor our NUTS sampler following DynamicHMC's design. Ideally, the sampler code should be testable and benchmarkable without dependency on other parts of Turing.

This also echoes the discussion in #456.

cc @willtebbutt @wesselb @mohamed82008

xukai92 commented 5 years ago

NUTS bug fixed in https://github.com/TuringLang/Turing.jl/pull/597/commits/d0dafa90fee506f0eb3bf281d02793ebddaf16bd