StatisticalRethinkingJulia / MCMCBenchmarks.jl

Comparing performance and results of mcmc options using Julia
MIT License
38 stars 6 forks source link

Maximum iterations for LBA #14

Closed itsdfish closed 5 years ago

itsdfish commented 5 years ago

I am intermittently encountering the following error with DynamicHMC with the LBA:

Reached maximum number of iterations while bisecting interval for ϵ

Perhaps one solution is to increase the iterations, but that may not be the best approach. Do you have any ideas?

goedman commented 5 years ago

Is it an error or a warning? I'll try it a few times.

itsdfish commented 5 years ago

It appears to be an error because it halts the entire program. I'm not entirely sure how DynamicHMC works, but it seems to choose the number of adaption iterations and will throw and error if it does not find a suitable epsilon after Nsamples.

goedman commented 5 years ago

Hmm, what I'm seeing I've copied below, exactly as shown in the REPL. Unfortunately when this happens results is empty (undefined actually).

I think below trace shows a dHMC error followed by a aHMC error. aHMC clearly indicates a gradient error. I wonder if that also occurs in dHMC. Do you see these error to just a Julia crash?

ERROR: LoadError: On worker 4:
Reached maximum number of iterations while bisecting interval for ϵ.
error at ./error.jl:33
bisect_stepsize at /Users/rob/.julia/packages/DynamicHMC/YX6JA/src/stepsize.jl:122
find_initial_stepsize at /Users/rob/.julia/packages/DynamicHMC/YX6JA/src/stepsize.jl:146
#NUTS_init#18 at /Users/rob/.julia/packages/DynamicHMC/YX6JA/src/sampler.jl:141 [inlined]
#NUTS_init at ./none:0
#NUTS_init_tune_mcmc#20 at /Users/rob/.julia/packages/DynamicHMC/YX6JA/src/sampler.jl:284
#NUTS_init_tune_mcmc at ./none:0 [inlined]
#NUTS_init_tune_mcmc#21 at /Users/rob/.julia/packages/DynamicHMC/YX6JA/src/sampler.jl:294 [inlined]
#NUTS_init_tune_mcmc at ./none:0
sampleDHMC at /Users/rob/.julia/dev/MCMCBenchmarks/Models/LBA/LBA_Models.jl:181
sampleDHMC at /Users/rob/.julia/dev/MCMCBenchmarks/Models/LBA/LBA_Models.jl:167
#runSampler#30 at /Users/rob/.julia/dev/MCMCBenchmarks/src/MCMCBenchmarks.jl:143 [inlined]
#runSampler at ./none:0
#benchmark!#20 at ./util.jl:289
#benchmark! at ./none:0
#benchmark#22 at /Users/rob/.julia/dev/MCMCBenchmarks/src/MCMCBenchmarks.jl:104
#benchmark at ./none:0 [inlined]
pfun at /Users/rob/.julia/dev/MCMCBenchmarks/src/MCMCBenchmarks.jl:117 [inlined]
#24 at /Users/rob/.julia/dev/MCMCBenchmarks/src/MCMCBenchmarks.jl:119
#112 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269
run_work_thunk at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:56
macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269 [inlined]
#111 at ./task.jl:259
Stacktrace:┌ Warning: Numerical error in gradients. Rejecting current proposal...
└ @ Turing.Core ~/.julia/packages/Turing/RZOZ8/src/core/ad.jl:169
┌ Warning: grad = Real[NaN, NaN, NaN, NaN, NaN, NaN]
└ @ Turing.Core ~/.julia/packages/Turing/RZOZ8/src/core/ad.jl:170

 [1] (::getfield(Base, Symbol("##696#698")))(::Task) at ./asyncmap.jl:178
 [2] foreach(::getfield(Base, Symbol("##696#698")), ::Array{Any,1}) at ./abstractarray.jl:1866
 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Array{Int64,1}) at ./asyncmap.jl:178
 [4] #async_usemap#681 at ./asyncmap.jl:154 [inlined]
 [5] #async_usemap at ./none:0 [inlined]
 [6] #asyncmap#680 at ./asyncmap.jl:81 [inlined]
 [7] #asyncmap at ./none:0 [inlined]
 [8] #pmap#213(::Bool, ::Int64, ::Nothing, ::Array{Any,1}, ::Nothing, ::Function, ::Function, ::WorkerPool, ::Array{Int64,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:126
 [9] pmap(::Function, ::WorkerPool, ::Array{Int64,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:101
 [10] #pmap#223(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Array{Int64,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:156
 [11] pmap at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:156 [inlined]
 [12] #pbenchmark#23 at /Users/rob/.julia/dev/MCMCBenchmarks/src/MCMCBenchmarks.jl:119 [inlined]
 [13] (::getfield(MCMCBenchmarks, Symbol("#kw##pbenchmark")))(::NamedTuple{(:Nsamples, :Nadapt, :delta, :Nd),Tuple{Int64,Int64,Float64,Array{Int64,1}}}, ::typeof(pbenchmark), ::Tuple{CmdStanNUTS{Stanmodel},AHMCNUTS{typeof(AHMClba),WARNING: both AdvancedHMC and DynamicHMC export "NUTS"; uses of it in module MCMCBenchmarks must be qualified
Turing.Inference.NUTS{Turing.Core.ForwardDiffAD{40},Union{}}},DHMCNUTS{typeof(sampleDHMC),Int64}}, ::Function, ::Int64) at ./none:0
 [14] top-level scope at none:0
 [15] include at ./boot.jl:326 [inlined]
 [16] include_relative(::Module, ::String) at ./loading.jl:1038
 [17] include(::Module, ::String) at ./sysimg.jl:29
 [18] include(::String) at ./client.jl:403
 [19] top-level scope at none:0
in expression starting at /Users/rob/.julia/dev/MCMCBenchmarks/Examples/LBA/LBA_Parallel_Example.jl:57
goedman commented 5 years ago

Chris, I expect we'll see more of these kind of issues, neither dHMC nor aHMC is ready for production (i.e. a 1.0 release) I think. I wonder if it is worthwhile to have in each example directory 3 (= no of samplers) .jl files to run a single sampler simulation and gives full access to the chains and debugging info.

At some point we might have to go back to Turing or Tamas or ... for help and they will expect a MWE. One thing I have always really appreciated in working with you that you always provided great MWEs!

In the LBA example I would like to see the average number of leapfrog steps.

itsdfish commented 5 years ago

Hi Rob-

Yes. I get the exact error message. If you run Turning by itself, it will throw a sporadic warning: Warning: grad = Real[NaN, NaN, NaN, NaN, NaN, NaN], typically upon initialization or shortly thereafter. It seems to work ok after that point, but it could still be a symptom of a more general problem.

I like that idea. Having a debug mode would be helpful, especially if there is a way to record the error and warning messages. I'll add an issue and look into this.

I'll put together a MWE to share with Turing and Tamas over the next couple of days.

Are leapfrog steps different from epsilon? Edit: Yes. I'll add an issue and incorporate that in the next couple of days.

goedman commented 5 years ago

I’m not sure about that, but I thought each leapfrog step moved the location by epsilon. The epsilon/stepsize reported I think is an initial stepsize, but if and when it is further adapted is not clear to me. I actually like dHMC well enough to make an attempt to understand the Neal and Betancourt papers once back in CO.

itsdfish commented 5 years ago

Here is the link to the issue on discourse for future reference.

itsdfish commented 5 years ago

I added Tama's fix. Seems to be working well.

goedman commented 5 years ago

Yes, it has almost finished the 3 samplers x [10, 50,200]obs x 50 simulations without problems.

Apologies for not communicating better that I had filed an issue on DynamicHMC. Somehow I assumed you were copied on that.

itsdfish commented 5 years ago

No problem.

I look forward to seeing the results. This will be a good test case to compare Stan and DynamicHMC.

goedman commented 5 years ago

By the time we hit the road it was at 42 of 50 in the last set of LBA. Should complete tonight I guess.

itsdfish commented 5 years ago

By the way, if you need to redo the benchmark, I added effective sample size per second to the code. I intended to do that a while ago but forgot until reading that email from Bob Carpenter.

itsdfish commented 5 years ago

The plots for effective sample size per second are interesting. I suspect the pattern might be due to differences between forward and reverse autodiff. Is that plausible?

summary_mu_ess_ps.pdf summary_sigma_ess_ps.pdf

goedman commented 5 years ago

Very interesting indeed. I (just temporarily) added my results for LBA as results_rob. Need to study these plots when back in Denver.

itsdfish commented 5 years ago

I added effective sample size per second plots. Across all of the metrics except one, DynamicHMC performs better than Stan. That is no small feat, especially for the LBA, which, at least in cognitive science, is known to be a difficult parameter estimation problem. The only place where is performed worse was rhat. It was more variable, but still within an acceptable range.

goedman commented 5 years ago

I’ve been wondering about rhat. Doesn’t that need multiple chains?

itsdfish commented 5 years ago

Yeah. I forgot about that. Actually, I was wondering why MCMCChains did not output NaN. Does it compare the first and second half of the chain?

If we need multiple chains, I recommend calling benchmark(...) and changing runSampler to the following:

function runSampler(s::AHMCNUTS,data;Nchains,kwargs...)
    return reduce(chainscat,pmap(x->sample(s.model(data...),s.config),1:Nchains)
end

If this seems reasonable to you, I will add those changes later today or tomorrow.