Closed itsdfish closed 3 years ago
Many thanks for the report and the effort made for the benchmarking.
We are aware of the low effective sample size issue reported some time ago (based on a simple Gaussian model). We recently did some tests on breaking down the NUTS components to see what happens. We found that the naive HMC and HMC with dual averaging are very similar to Stan's (based on ESS and adaptaion results) while the NUTS does not have comparable ESS but similar adpataion results. We ended up guessing that it's probably because that the NUTS we use are in fact different from NUTS's and DynamicHMC's: we use slice sampling to obtain samples from the doubling tree while Stan and DynamicHMC uses multionomial sampling. This leads to our WIP PR to change ours to multonomial sampling: https://github.com/TuringLang/AdvancedHMC.jl/pull/79
I will check these models with our multonomial sampling version (when it's avaiable) and see how it works soon.
Just a side note. Turing now drops the samples in the adaptation phase by default so there is no need to do chain = chns[(Nadapt+1):end,:,:]
in the end. But as checked it is correctly implemented in MCMCBenchmarks.jl, is it right?
Opps. I think it still throws the burn-in samples (which means it throws burn-in samples again). As this is epxected to be the default behaviour of Turing in the future, I will create a PR over MCMCBenchmarks.jl
@itsdfish Regarding 1, I'm running into different results on MCMCBenchmarks.jl and one of the script you (or Rob) previously shared with me (diag_test.jl.zip). To be more specific
AHMCconfig = Turing.NUTS(2000,1000,.85; metricT=AdvancedHMC.DiagEuclideanMetric)
to make sure Turing.jl is using the diagonal adaptation, and simply run the Gaussian model (I commented out some lines as it was reported in https://github.com/StatisticalRethinkingJulia/MCMCBenchmarks.jl/issues/25).
summary_sigma_ess_ps.pdf
and summary_mu_ess_ps.pdf
(
summary.zip
) which show Turing's ESS decreases with the number of samples used increasing.diag_test.jl
, I change the number of samples used from 5 to 30, to 300 and run the script 3 times.
Any idea why this happens?
PS: I'm running things on recent release of Turing.jl and this branch of AdvancedHMC.jl (https://github.com/TuringLang/AdvancedHMC.jl/tree/kx/multinomial).
Hi Kai-
I think I figured out the problem. In the first set of analyzes, summary_sigmaessps.pdf is effective sample size per second for the sigma parameter, which you understandably mistook as raw effective sample size. Currently, we output distributions of effective sample size, but it is probably reasonable to add means also (or use it instead). In the meantime, you can use this to generate the plot you are looking for:
ess = plotsummary(results,:Nd,:ess,(:sampler,);save=true,dir=dir)
On my system, I obtain something like this:
summary_mu_ess.pdf summary_sigma_ess.pdf
I hope that helps and please bear with us until we add documentation.
P.S. I should note those figures were generated with the previous version of Turing and the default parameters.
Re: https://github.com/TuringLang/Turing.jl/issues/840#issuecomment-509393404
I see. Thanks for the clarificaiton! This makes sense to me now.
Re: https://github.com/TuringLang/Turing.jl/issues/840#issuecomment-509411812
Thanks!
Hi-
In case you are unaware, the Poisson regression still fails to converge with the new multinomial sampling. I am using the master branches of Turing and AdvancedHMC because they have bug fixes described here and here:
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼───────────┼──────────┼────────────┼───────────┼─────────┼─────────┤
│ 1 │ a0 │ 0.2888 │ 0.931632 │ 0.0147304 │ 0.122927 │ 34.5533 │ 1.10511 │
│ 2 │ a0_sig │ 0.354791 │ 0.132084 │ 0.00208843 │ 0.0150491 │ 43.128 │ 1.15244 │
│ 3 │ a0s[1] │ -0.121081 │ 0.187378 │ 0.0029627 │ 0.0253339 │ 19.5141 │ 1.25898 │
│ 4 │ a0s[2] │ -0.26234 │ 0.26919 │ 0.00425627 │ 0.0381923 │ 23.1839 │ 1.32593 │
│ 5 │ a0s[3] │ 0.230442 │ 0.200536 │ 0.00317076 │ 0.0263961 │ 21.5578 │ 1.18891 │
│ 6 │ a0s[4] │ 0.164603 │ 0.235531 │ 0.00372407 │ 0.0338144 │ 21.3094 │ 1.3637 │
│ 7 │ a0s[5] │ -0.161985 │ 0.189921 │ 0.00300291 │ 0.0269448 │ 18.9195 │ 1.34745 │
│ 8 │ a0s[6] │ 0.100176 │ 0.197166 │ 0.00311746 │ 0.0262556 │ 21.0273 │ 1.21602 │
│ 9 │ a0s[7] │ 0.35988 │ 0.265921 │ 0.00420458 │ 0.0379563 │ 22.7123 │ 1.33589 │
│ 10 │ a0s[8] │ -0.477994 │ 0.186985 │ 0.0029565 │ 0.0260595 │ 19.9471 │ 1.32371 │
│ 11 │ a0s[9] │ 0.0519478 │ 0.254021 │ 0.00401642 │ 0.032194 │ 27.8074 │ 1.11512 │
│ 12 │ a0s[10] │ 0.0844027 │ 0.186093 │ 0.00294239 │ 0.0255003 │ 20.0431 │ 1.23501 │
│ 13 │ a1 │ 0.59155 │ 0.112818 │ 0.00178381 │ 0.0150285 │ 33.7903 │ 1.13195 │
Thanks for pointing this again. Can you give me some adivce of interpreting the results, i.e. how do you tell the convergence for the Poisson regression model?
No problem. Generally, speaking I look for effective sample size that is at least 30% of the number of saved samples (300 in this case) and and rhat < 1.05. That is just a rule of thumb, but the metrics for ess and rhat above are pretty bad.
If it would be helpful, I can provide those metrics for both Stan and Turing, which would provide a better idea of how Turing is performing.
Thanks a lot @itsdfish! I investigated a bit and found it might be due to a pretty dumb reason. As you know AHMC was slow so we set the default maximum depth of NUTS in Turing to 5 (althought it's 10 by default in AHMC, it's override in Turing).
I changed it to 10 and re-run the code you provided, which gives me:
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼───────────┼───────────┼────────────┼────────────┼─────────┼─────────┤
│ 1 │ a0 │ 0.374652 │ 0.818352 │ 0.0129393 │ 0.0539238 │ 270.994 │ 1.01078 │
│ 2 │ a0_sig │ 0.324592 │ 0.0997393 │ 0.00157702 │ 0.00347919 │ 628.047 │ 1.00069 │
│ 3 │ a0s[1] │ -0.116291 │ 0.137415 │ 0.00217273 │ 0.0082362 │ 268.881 │ 1.02334 │
│ 4 │ a0s[2] │ -0.237583 │ 0.189071 │ 0.00298948 │ 0.0111819 │ 305.1 │ 1.018 │
│ 5 │ a0s[3] │ 0.230482 │ 0.153759 │ 0.00243114 │ 0.00991924 │ 257.542 │ 1.02266 │
│ 6 │ a0s[4] │ 0.183472 │ 0.155971 │ 0.00246613 │ 0.00899458 │ 314.569 │ 1.02092 │
│ 7 │ a0s[5] │ -0.161092 │ 0.123878 │ 0.00195868 │ 0.00736051 │ 238.432 │ 1.02905 │
│ 8 │ a0s[6] │ 0.0958219 │ 0.144607 │ 0.00228645 │ 0.00883009 │ 273.993 │ 1.02407 │
│ 9 │ a0s[7] │ 0.38144 │ 0.181887 │ 0.00287588 │ 0.0109348 │ 306.522 │ 1.02002 │
│ 10 │ a0s[8] │ -0.473231 │ 0.130584 │ 0.00206472 │ 0.00796292 │ 248.984 │ 1.03119 │
│ 11 │ a0s[9] │ 0.0374965 │ 0.217759 │ 0.00344308 │ 0.0135209 │ 277.704 │ 1.01794 │
│ 12 │ a0s[10] │ 0.0889275 │ 0.138067 │ 0.00218303 │ 0.00894153 │ 225.447 │ 1.03354 │
│ 13 │ a1 │ 0.580512 │ 0.0967509 │ 0.00152977 │ 0.00626249 │ 280.587 │ 1.0102 │
Is this more comparable to Stan? If not some numbers from Stan would be helpful for me to furthur diagnose. Thanks!
No problem. Thanks for looking into this. What you have is a big improvement. However, for Stan, rhat is still lower, and more importantly, ESS is about 50-100% larger in many cases. I used 1000 samples, 1000 warmup samples and four chains, as in the original post:
Object of type Chains, with data of type 1000×20×4 Array{Float64,3}
Iterations = 1001:2000
Thinning interval = 1
Chains = 1, 2, 3, 4
Samples per chain = 1000
internals = lp__, accept_stat__, stepsize__, treedepth__, n_leapfrog__, divergent__, energy__
parameters = a0, a0s.1, a0s.2, a0s.3, a0s.4, a0s.5, a0s.6, a0s.7, a0s.8, a0s.9, a0s.10, a1, a0_sig
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼─────────────┼──────────┼────────────┼────────────┼─────────┼─────────┤
│ 1 │ a0 │ 2.91917 │ 1.17409 │ 0.018564 │ 0.0552791 │ 506.497 │ 1.0086 │
│ 2 │ a0_sig │ 0.391864 │ 0.12055 │ 0.00190606 │ 0.0039154 │ 726.929 │ 1.00551 │
│ 3 │ a0s.1 │ 0.0110985 │ 0.226923 │ 0.00358797 │ 0.0116164 │ 422.175 │ 1.0054 │
│ 4 │ a0s.2 │ -0.00752534 │ 0.183497 │ 0.00290134 │ 0.00973214 │ 493.862 │ 1.00652 │
│ 5 │ a0s.3 │ 0.440896 │ 0.14912 │ 0.00235779 │ 0.00842415 │ 423.263 │ 1.00453 │
│ 6 │ a0s.4 │ 0.0697749 │ 0.158924 │ 0.00251281 │ 0.00883451 │ 398.797 │ 1.00345 │
│ 7 │ a0s.5 │ 0.0722853 │ 0.162671 │ 0.00257205 │ 0.00903263 │ 398.417 │ 1.00382 │
│ 8 │ a0s.6 │ -0.337904 │ 0.260617 │ 0.00412071 │ 0.0131449 │ 427.474 │ 1.00582 │
│ 9 │ a0s.7 │ -0.200556 │ 0.14555 │ 0.00230135 │ 0.00826782 │ 386.612 │ 1.00383 │
│ 10 │ a0s.8 │ 0.500156 │ 0.143458 │ 0.00226827 │ 0.00824504 │ 375.848 │ 1.00383 │
│ 11 │ a0s.9 │ -0.0801571 │ 0.283813 │ 0.00448748 │ 0.0141514 │ 515.564 │ 1.00849 │
│ 12 │ a0s.10 │ -0.505856 │ 0.196079 │ 0.00310029 │ 0.0102691 │ 502.543 │ 1.0072 │
│ 13 │ a1 │ 0.277116 │ 0.132474 │ 0.0020946 │ 0.0062143 │ 495.771 │ 1.00839 │
Thanks! Can you also post me other stats (i.e. lp__, accept_stat__, stepsize__, treedepth__, n_leapfrog__, divergent__, energy__
)? Thanks!
@itsdfish You can get everything shown really quickly by calling
describe(chain, showall=true)
Thanks @cpfiffer. Here are the stats:
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼───────────────┼────────────┼─────────────┼────────────┼────────────┼─────────┼────────────┤
│ 1 │ accept_stat__ │ 0.941958 │ 0.080377 │ 0.00127087 │ 0.00162432 │ 4000.0 │ 1.00481 │
│ 2 │ divergent__ │ 0.0 │ 0.0 │ 0.0 │ 0.0 │ NaN │ NaN │
│ 3 │ energy__ │ -1.03149e5 │ 3.74854 │ 0.0592696 │ 0.107191 │ 928.861 │ 1.00126 │
│ 4 │ lp__ │ 1.03156e5 │ 2.71593 │ 0.0429426 │ 0.0884147 │ 754.944 │ 1.00319 │
│ 5 │ n_leapfrog__ │ 457.064 │ 358.381 │ 5.6665 │ 8.96849 │ 2417.05 │ 1.00766 │
│ 6 │ stepsize__ │ 0.00463429 │ 0.000334517 │ 5.28918e-6 │ 5.35589e-5 │ 16.0643 │ 1.09619e13 │
│ 7 │ treedepth__ │ 7.91725 │ 1.31371 │ 0.0207716 │ 0.0425757 │ 747.621 │ 1.01743 │
Let me know if you need anything else.
Seems that the results vary a lot based on the synthestic data. I can also get something like below which looks pretty nice.
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼───────────┼───────────┼────────────┼────────────┼─────────┼──────────┤
│ 1 │ a0 │ 0.432889 │ 0.809888 │ 0.0128055 │ 0.0437089 │ 428.974 │ 1.01011 │
│ 2 │ a0_sig │ 0.326824 │ 0.0956446 │ 0.00151227 │ 0.00349493 │ 795.304 │ 1.00914 │
│ 3 │ a0s[1] │ -0.11389 │ 0.138769 │ 0.00219413 │ 0.00539469 │ 528.726 │ 1.00073 │
│ 4 │ a0s[2] │ -0.217714 │ 0.185011 │ 0.00292528 │ 0.00984951 │ 370.074 │ 1.02207 │
│ 5 │ a0s[3] │ 0.23187 │ 0.152519 │ 0.00241154 │ 0.00667814 │ 542.281 │ 0.999885 │
│ 6 │ a0s[4] │ 0.200843 │ 0.156146 │ 0.00246889 │ 0.00791065 │ 373.861 │ 1.02153 │
│ 7 │ a0s[5] │ -0.156917 │ 0.126857 │ 0.00200579 │ 0.00504307 │ 445.973 │ 1.00776 │
│ 8 │ a0s[6] │ 0.103235 │ 0.146701 │ 0.00231954 │ 0.0057391 │ 539.697 │ 0.999679 │
│ 9 │ a0s[7] │ 0.402339 │ 0.181189 │ 0.00286486 │ 0.00955511 │ 373.021 │ 1.0222 │
│ 10 │ a0s[8] │ -0.466462 │ 0.130819 │ 0.00206844 │ 0.0049619 │ 550.583 │ 1.00673 │
│ 11 │ a0s[9] │ 0.028516 │ 0.215491 │ 0.00340721 │ 0.00934239 │ 515.235 │ 1.00134 │
│ 12 │ a0s[10] │ 0.0914416 │ 0.143152 │ 0.00226343 │ 0.00607738 │ 494.816 │ 1.0008 │
│ 13 │ a1 │ 0.572657 │ 0.0956253 │ 0.00151197 │ 0.0052389 │ 417.607 │ 1.01326 │
I wondered the same thing. I ran a benchmark with 20 reps of 1000 samples, 1000 warmup samples, sample size [10,20], and one chain. My plots are not working for some reason, but here are tables of the mean ESS for key parameters:
by(results,[:sampler,:Ns],:a0_sig_ess=>mean)
6×3 DataFrame
│ Row │ sampler │ Ns │ a0_sig_ess_mean │
│ │ Symbol⍰ │ Int64⍰ │ Float64 │
├─────┼─────────────┼────────┼─────────────────┤
│ 1 │ CmdStanNUTS │ 10 │ 275.658 │
│ 2 │ AHMCNUTS │ 10 │ 141.072 │
│ 3 │ DHMCNUTS │ 10 │ 534.693 │
│ 4 │ CmdStanNUTS │ 20 │ 436.461 │
│ 5 │ AHMCNUTS │ 20 │ 335.877 │
│ 6 │ DHMCNUTS │ 20 │ 863.51 │
by(results,[:sampler,:Ns],:a0_ess=>mean)
6×3 DataFrame
│ Row │ sampler │ Ns │ a0_ess_mean │
│ │ Symbol⍰ │ Int64⍰ │ Float64 │
├─────┼─────────────┼────────┼─────────────┤
│ 1 │ CmdStanNUTS │ 10 │ 233.283 │
│ 2 │ AHMCNUTS │ 10 │ 95.9255 │
│ 3 │ DHMCNUTS │ 10 │ 804.726 │
│ 4 │ CmdStanNUTS │ 20 │ 168.452 │
│ 5 │ AHMCNUTS │ 20 │ 135.125 │
│ 6 │ DHMCNUTS │ 20 │ 921.038 │
by(results,[:sampler,:Ns],:a1_ess=>mean)
6×3 DataFrame
│ Row │ sampler │ Ns │ a1_ess_mean │
│ │ Symbol⍰ │ Int64⍰ │ Float64 │
├─────┼─────────────┼────────┼─────────────┤
│ 1 │ CmdStanNUTS │ 10 │ 229.268 │
│ 2 │ AHMCNUTS │ 10 │ 94.719 │
│ 3 │ DHMCNUTS │ 10 │ 811.847 │
│ 4 │ CmdStanNUTS │ 20 │ 166.287 │
│ 5 │ AHMCNUTS │ 20 │ 132.555 │
│ 6 │ DHMCNUTS │ 20 │ 915.751 │
(v1.1) pkg> st Turing
Status `~/.julia/environments/v1.1/Project.toml`
[0bf59076] AdvancedHMC v0.2.2 #master (https://github.com/TuringLang/AdvancedHMC.jl.git)
[31c24e10] Distributions v0.21.1
[f6369f11] ForwardDiff v0.10.3
[c7f686f2] MCMCChains v0.3.11
[4c63d2b9] StatsFuns v0.8.0
[9f7883ad] Tracker v0.2.2
[fce5fe82] Turing v0.6.23 #master (https://github.com/TuringLang/Turing.jl.git)
[e88e6eb3] Zygote v0.3.2
I also saw a fairly high number of numerical errors (10-15 per model application).
I ran a slightly different benchmark to avoid missings in plots. Here are the parameters:
Key results:
(v1.1) pkg> st Turing
Status `~/.julia/environments/v1.1/Project.toml`
[0bf59076] AdvancedHMC v0.2.2 #master (https://github.com/TuringLang/AdvancedHMC.jl.git)
[31c24e10] Distributions v0.21.1
[f6369f11] ForwardDiff v0.10.3
[c7f686f2] MCMCChains v0.3.11
[4c63d2b9] StatsFuns v0.8.0
[9f7883ad] Tracker v0.2.2
[fce5fe82] Turing v0.6.23 #master (https://github.com/TuringLang/Turing.jl.git)
[e88e6eb3] Zygote v0.3.2
I'm not sure why ESS is lower for Turing compared to CmdStan. I suspect the numerical errors might provide a clue. The speed difference between Turing and DynamicHMC might be due to the higher number of allocations in Turing.
The ESS issue for the Poisson model should be resolved when the generalised U-turn PR is merged (https://github.com/TuringLang/AdvancedHMC.jl/pull/91).
The numbers look like below in my local.
6×7 DataFrame
│ Row │ sampler │ Nd │ a0_ess_mean │ a1_ess_mean │ a0_sig_ess_mean │ tree_depth_mean │ epsilon_mean │
│ │ String │ Int64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼─────────────┼───────┼─────────────┼─────────────┼─────────────────┼─────────────────┼──────────────┤
│ 1 │ CmdStanNUTS │ 1 │ 208.297 │ 208.776 │ 272.023 │ 7.24086 │ 0.0161274 │
│ 2 │ AHMCNUTS │ 1 │ 203.079 │ 204.569 │ 274.711 │ 7.23374 │ 0.0157469 │
│ 3 │ CmdStanNUTS │ 2 │ 201.638 │ 202.592 │ 263.016 │ 7.46326 │ 0.0119092 │
│ 4 │ AHMCNUTS │ 2 │ 213.495 │ 215.527 │ 252.686 │ 7.43646 │ 0.0121216 │
│ 5 │ CmdStanNUTS │ 5 │ 174.845 │ 174.25 │ 223.66 │ 7.71896 │ 0.00812359 │
│ 6 │ AHMCNUTS │ 5 │ 200.267 │ 200.766 │ 230.273 │ 7.75796 │ 0.00801717 │
Thanks for the update, Kai. Effective sample size looks much better! I look forward to trying out the changes when they are merged.
It's merged and just waiting for a new release (https://github.com/JuliaRegistries/General/pull/2797).
Is this fixed now with the new AHMC release?
I don't think so. A comparison between CmdStan and Turing with AdvancedHMC .2.5 shows problems with effective sample size continue to persist. Here are some examples from a Poisson regression model.
.
It's quite different from the benchmark I did a few weeks ago using MCMCBenchmarks (see http://xuk.ai/assets/StanCon-AHMC.pdf). Are you using the master branch or something?
Yeah. That is very odd. I obtained the same results with MCMCBencharks master and 0.5.1. Looking through the history, the only changes that were made to Hierachical_Poisson_Models.jl since your last commit were for DynamicHMC. So I don't think a bug was introduced into the model.
I looked over changes to the source code of MCMCBenchmarks, but nothing seemed problematic at first glance. The changes I made did not seem to affect the Gaussian model because CmdStan and Turing have comparable ESS.
I'll try to look into this more over the weekend.
(v1.1) pkg> st MCMCBenchmarks
Status `~/.julia/environments/v1.1/Project.toml`
[0bf59076] AdvancedHMC v0.2.5
[6e4b80f9] BenchmarkTools v0.4.3
[336ed68f] CSV v0.5.12
[593b3428] CmdStan v5.2.0
[a93c6f00] DataFrames v0.19.4
[31c24e10] Distributions v0.21.1
[bbc10e6e] DynamicHMC v2.1.0
[f6369f11] ForwardDiff v0.10.3
[6fdf6af0] LogDensityProblems v0.9.1
[72ce6e51] MCMCBenchmarks v0.5.1
[c7f686f2] MCMCChains v0.3.14
[d96e819e] Parameters v0.12.0
[189a3867] Reexport v0.2.0
[295af30f] Revise v2.2.0
[276daf66] SpecialFunctions v0.7.2
[f3b207a7] StatsPlots v0.12.0
[84d833dd] TransformVariables v0.3.6
[fce5fe82] Turing v0.6.23
Julia 1.1.1
I couldn't find a problem with MCMCBenchmarks. I obtained similar results with AdvancedHMC 0.2.3. Here is a minimum working example:
using Turing, StatsPlots, Distributions, SpecialFunctions, Random
import Distributions: logpdf
function simulatePoisson(; Nd=1, Ns=10, a0=1.0, a1=.5, a0_sig=.3, kwargs...)
N = Nd * Ns
y = fill(0, N)
x = fill(0.0, N)
idx = similar(y)
i = 0
for s in 1:Ns
a0s = rand(Normal(0, a0_sig))
logpop = rand(Normal(9, 1.5))
λ = exp(a0 + a0s + a1 * logpop)
for nd in 1:Nd
i += 1
x[i] = logpop
idx[i] = s
y[i] = rand(Poisson(λ))
end
end
return (y=y, x=x, idx=idx, N=N, Ns=Ns)
end
struct LogPoisson{T<:Real} <: DiscreteUnivariateDistribution
logλ::T
end
function logpdf(lp::LogPoisson, k::Int)
return k * lp.logλ - exp(lp.logλ) - lgamma(k + 1)
end
@model AHMCpoisson(y, x, idx, N, Ns) = begin
a0 ~ Normal(0, 10)
a1 ~ Normal(0, 1)
a0_sig ~ Truncated(Cauchy(0, 1), 0, Inf)
a0s ~ MvNormal(zeros(Ns), a0_sig)
for i ∈ 1:N
λ = a0 + a0s[idx[i]] + a1 * x[i]
y[i] ~ LogPoisson(λ)
end
end
Random.seed!(39)
data = simulatePoisson(;Ns = 10)
chain = sample(AHMCpoisson(data...),NUTS(2000,1000,.8))
println(chain)
plot(chain[:a0])
Thanks for the code. The results you share doesn't include ESS. My result of running the MWE you provided gives
Object of type Chains, with data of type 1000×22×1 Array{Union{Missing, Real},3}
Iterations = 1:1000
Thinning interval = 1
Chains = 1
Samples per chain = 1000
internals = acceptance_rate, hamiltonian_energy, is_accept, log_density, lp, n_steps, numerical_error, step_size, tree_depth
parameters = a0, a0_sig, a0s[1], a0s[2], a0s[3], a0s[4], a0s[5], a0s[6], a0s[7], a0s[8], a0s[9], a0s[10], a1
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼────────────┼──────────┼────────────┼────────────┼─────────┼──────────┤
│ 1 │ a0 │ 0.895491 │ 1.12556 │ 0.0355933 │ 0.0642669 │ 215.637 │ 0.999022 │
│ 2 │ a0_sig │ 0.357993 │ 0.110865 │ 0.00350587 │ 0.00676386 │ 288.628 │ 1.00146 │
│ 3 │ a0s[1] │ -0.142888 │ 0.129212 │ 0.00408605 │ 0.0075781 │ 236.867 │ 1.00811 │
│ 4 │ a0s[2] │ 0.467269 │ 0.120416 │ 0.0038079 │ 0.00804799 │ 216.413 │ 1.00967 │
│ 5 │ a0s[3] │ -0.0813931 │ 0.15794 │ 0.00499452 │ 0.00803419 │ 217.796 │ 1.00461 │
│ 6 │ a0s[4] │ -0.375486 │ 0.140475 │ 0.0044422 │ 0.00739035 │ 231.58 │ 1.0056 │
│ 7 │ a0s[5] │ -0.328649 │ 0.152504 │ 0.00482261 │ 0.00799011 │ 218.841 │ 1.00629 │
│ 8 │ a0s[6] │ 0.401141 │ 0.135664 │ 0.00429008 │ 0.00832431 │ 222.68 │ 1.00878 │
│ 9 │ a0s[7] │ -0.0521814 │ 0.301167 │ 0.00952373 │ 0.0192478 │ 218.042 │ 1.0003 │
│ 10 │ a0s[8] │ 0.0837627 │ 0.12595 │ 0.00398288 │ 0.00940983 │ 206.132 │ 1.00909 │
│ 11 │ a0s[9] │ -0.186746 │ 0.164949 │ 0.00521613 │ 0.0120216 │ 202.941 │ 1.00255 │
│ 12 │ a0s[10] │ 0.117228 │ 0.124092 │ 0.00392415 │ 0.00842289 │ 218.936 │ 1.01107 │
│ 13 │ a1 │ 0.498655 │ 0.120539 │ 0.00381179 │ 0.00655568 │ 213.745 │ 0.999026 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼───────────┼────────────┼────────────┼────────────┼───────────┤
│ 1 │ a0 │ -1.45633 │ 0.164452 │ 0.955111 │ 1.60549 │ 3.10455 │
│ 2 │ a0_sig │ 0.212185 │ 0.284172 │ 0.335621 │ 0.405613 │ 0.638829 │
│ 3 │ a0s[1] │ -0.412466 │ -0.221602 │ -0.142009 │ -0.0622003 │ 0.11225 │
│ 4 │ a0s[2] │ 0.231944 │ 0.392252 │ 0.463197 │ 0.545739 │ 0.710186 │
│ 5 │ a0s[3] │ -0.37997 │ -0.191058 │ -0.0732723 │ 0.0245348 │ 0.228448 │
│ 6 │ a0s[4] │ -0.645226 │ -0.46758 │ -0.368753 │ -0.280846 │ -0.100005 │
│ 7 │ a0s[5] │ -0.625378 │ -0.438088 │ -0.327296 │ -0.217246 │ -0.028741 │
│ 8 │ a0s[6] │ 0.109995 │ 0.305765 │ 0.408784 │ 0.49118 │ 0.65865 │
│ 9 │ a0s[7] │ -0.605775 │ -0.257336 │ -0.0606962 │ 0.13829 │ 0.59724 │
│ 10 │ a0s[8] │ -0.178198 │ 0.00384925 │ 0.0885033 │ 0.166566 │ 0.330785 │
│ 11 │ a0s[9] │ -0.49266 │ -0.297873 │ -0.1814 │ -0.0735794 │ 0.126346 │
│ 12 │ a0s[10] │ -0.121548 │ 0.0320237 │ 0.119097 │ 0.198276 │ 0.357121 │
│ 13 │ a1 │ 0.263909 │ 0.420791 │ 0.493074 │ 0.577344 │ 0.74598 │
Is it the same as yours?
If not, I suspect somehow your Turing is not using the update version of AHMC?
Sorry. I should have included the summary (note that the non-stationary trace plot suggests low ESS).
Here is my summary:
Object of type Chains, with data of type 1000×23×1 Array{Union{Missing, Float64},3}
Log evidence = 0.0
Iterations = 1:1000
Thinning interval = 1
Chains = 1
Samples per chain = 1000
internals = acceptance_rate, eval_num, hamiltonian_energy, is_accept, log_density, lp, n_steps, numerical_error, step_size, tree_depth
parameters = a0, a0_sig, a0s[1], a0s[2], a0s[3], a0s[4], a0s[5], a0s[6], a0s[7], a0s[8], a0s[9], a0s[10], a1
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼───────────┼───────────┼────────────┼───────────┼─────────┼─────────┤
│ 1 │ a0 │ 0.25734 │ 0.904652 │ 0.0286076 │ 0.180165 │ 25.7848 │ 1.00111 │
│ 2 │ a0_sig │ 0.385031 │ 0.0934959 │ 0.0029566 │ 0.0195804 │ 22.513 │ 1.05103 │
│ 3 │ a0s[1] │ -0.118888 │ 0.154196 │ 0.00487609 │ 0.0421908 │ 7.26326 │ 1.01569 │
│ 4 │ a0s[2] │ 0.487679 │ 0.149209 │ 0.0047184 │ 0.042932 │ 7.49517 │ 0.99988 │
│ 5 │ a0s[3] │ -0.11255 │ 0.149933 │ 0.00474129 │ 0.0384453 │ 9.50847 │ 1.00325 │
│ 6 │ a0s[4] │ -0.38551 │ 0.152533 │ 0.00482353 │ 0.0398215 │ 8.28392 │ 1.00414 │
│ 7 │ a0s[5] │ -0.359556 │ 0.150769 │ 0.00476773 │ 0.0395436 │ 9.23062 │ 1.0063 │
│ 8 │ a0s[6] │ 0.391066 │ 0.14374 │ 0.00454546 │ 0.0396994 │ 7.97456 │ 1.00353 │
│ 9 │ a0s[7] │ 0.128352 │ 0.281772 │ 0.0089104 │ 0.0640121 │ 15.0364 │ 1.00179 │
│ 10 │ a0s[8] │ 0.11543 │ 0.152279 │ 0.00481549 │ 0.0433674 │ 7.43558 │ 1.00167 │
│ 11 │ a0s[9] │ -0.116702 │ 0.182076 │ 0.00575775 │ 0.045569 │ 9.90864 │ 1.02162 │
│ 12 │ a0s[10] │ 0.131333 │ 0.151793 │ 0.00480012 │ 0.0441841 │ 7.69638 │ 1.00594 │
│ 13 │ a1 │ 0.564955 │ 0.0932821 │ 0.00294984 │ 0.017341 │ 28.2881 │ 1.00017 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼───────────┼────────────┼───────────┼─────────────┼────────────┤
│ 1 │ a0 │ -1.55721 │ -0.350089 │ 0.280959 │ 0.915745 │ 1.85898 │
│ 2 │ a0_sig │ 0.22418 │ 0.320691 │ 0.379533 │ 0.443339 │ 0.587711 │
│ 3 │ a0s[1] │ -0.454146 │ -0.222612 │ -0.103852 │ -0.00940392 │ 0.168154 │
│ 4 │ a0s[2] │ 0.203723 │ 0.385027 │ 0.491964 │ 0.580074 │ 0.790241 │
│ 5 │ a0s[3] │ -0.390252 │ -0.212872 │ -0.10848 │ -0.00921342 │ 0.182705 │
│ 6 │ a0s[4] │ -0.692409 │ -0.495817 │ -0.364626 │ -0.274441 │ -0.121393 │
│ 7 │ a0s[5] │ -0.627087 │ -0.474513 │ -0.355538 │ -0.248264 │ -0.0676558 │
│ 8 │ a0s[6] │ 0.123875 │ 0.285813 │ 0.392087 │ 0.491312 │ 0.673782 │
│ 9 │ a0s[7] │ -0.424979 │ -0.0495103 │ 0.11791 │ 0.320994 │ 0.670225 │
│ 10 │ a0s[8] │ -0.189626 │ 0.0250555 │ 0.118156 │ 0.216578 │ 0.406702 │
│ 11 │ a0s[9] │ -0.503479 │ -0.238112 │ -0.113226 │ 0.0132313 │ 0.228342 │
│ 12 │ a0s[10] │ -0.161768 │ 0.0182823 │ 0.145505 │ 0.236064 │ 0.444482 │
│ 13 │ a1 │ 0.395641 │ 0.501628 │ 0.560841 │ 0.629406 │ 0.754552 │
I believe I have the latest package versions:
(v1.1) pkg> st Turing
Status `~/.julia/environments/v1.1/Project.toml`
[0bf59076] AdvancedHMC v0.2.5
[31c24e10] Distributions v0.21.1
[f6369f11] ForwardDiff v0.10.3
[c7f686f2] MCMCChains v0.3.14
[189a3867] Reexport v0.2.0
[276daf66] SpecialFunctions v0.7.2
[4c63d2b9] StatsFuns v0.8.0
[9f7883ad] Tracker v0.2.3
[fce5fe82] Turing v0.6.23
[e88e6eb3] Zygote v0.3.4
This is perplexing.
I was using master. I downgraded myself to the rencet release v.6.23. I confirm this issue does exist for v.6.23.
Looking over the differences between master and recent release (https://github.com/TuringLang/Turing.jl/compare/v0.6.23...master), seems like the recent release is still using 5 as the default tree depth.
@yebai @cpfiffer I think we really need to make a new release. What stops us from doing so? Is it the update of the documentation to reflect the new interface?
I think we can make a new release once https://github.com/TuringLang/Turing.jl/pull/917, https://github.com/TuringLang/Turing.jl/pull/908 and https://github.com/TuringLang/Turing.jl/pull/915 are merged.
@mohamed82008 can you give #908 another try?
Sure
@itsdfish This should be fixed by release 0.7. Can you try re-running the benchmarks?
@yebai, you can find the benchmarks within the results subfolders for each model here.
Here is a summary of what I found.
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfiniteθ = true
│ isfiniter = false
│ isfiniteℓπ = false
│ isfiniteℓκ = false
ESS was comparable to Stan for all of the models, except linear regression. This difference could be due to the use of MvNormal
in Turing and DynamicHMC (but not Stan).
Turing was consistently slower than Stan and was more sensitive to increased data points and/or parameters. I suspect this is partially a result of reverse vs forward autodiff.
Turing has a somewhat large startup cost compared to DynamicHMC.
Turing is nearly 2 times slower than DynamicHMC (regardless of sample size) for linear regression. I've found this to be the case for other models not included in MCMCBenchmarks.
Many thanks, @itsdfish! These are helpful to planing Turing's next release.
Turing consistently produced numerical errors for linear regression (see below). There were alot more numerical errors for SDT. Numerical errors were not a problem for the other models.
This is more likely caused by the model itself and its parameterisation.
ESS was comparable to Stan for all of the models, except linear regression. This difference could be due to the use of MvNormal in Turing and DynamicHMC (but not Stan).
This is consistent with @xukai92's results and good to be reproduced independently.
Turing was consistently slower than Stan and was more sensitive to increased data points and/or parameters. I suspect this is partially a result of reverse vs forward autodiff.
We will revisit the issue of switching default AD engine to reverse mode soon, since Distributions are more compatible with Tracker
with DistributionsAD.jl
.
Turing has a somewhat large startup cost compared to DynamicHMC. Turing is nearly 2 times slower than DynamicHMC (regardless of sample size) for linear regression. I've found this to be the case for other models not included in MCMCBenchmarks.
This is likely from the overhead incurred by Turing's DSL. @mohamed82008 made a lot of progress in making sure model functions are type-stable, which leads to a leap in Turing's performance recently. But there is further space for improvements. We'll look into how to further reduce Turing's overhead in v0.8
.
My pleasure. I'm glad it was helpful.
I tried Tracker and Zygote in the past without much success. Models that required seconds with ForwardDiff required more than an hour with Zygote and Tracker. I'm not sure what that problem was. I can try to find those benchmarks if that information would be helpful.
In meantime, I amenable to reparameterizing the SDT and Linear Regression models. Please let me know if you or other Turing members have any recommendations.
I tried Tracker and Zygote in the past without much success. Models that required seconds with ForwardDiff required more than an hour with Zygote and Tracker. I'm not sure what that problem was. I can try to find those benchmarks if that information would be helpful.
We started to benchmark Turing and AHMC on different AD backedns. So far what we know is that if there is controll flows, Tracker (high dim) and ForwardDiff (low dim) still out performs Zygote.
In meantime, I amenable to reparameterizing the SDT and Linear Regression models. Please let me know if you or other Turing members have any recommendations.
Do you have any progress here for this. I've been seeing there are a lot of changes in MCMCBenchmarks.jl. I wonder what's the current status of it.
BTW, I saw you were discussing a numeric issue in https://github.com/tpapp/DynamicHMC.jl/issues/95, does AHMC also has this issue?
Hi Kai-
Thanks for the update. Most of the recent changes to MCMCBenchmarks have been for maintenance and reorganization. As of right now, there is one optimization I can add. I'll try to add that within the next week and update the benchmarks to reflect changes in Turing and AdvancedHMC. Please let me know if there are more optimizations you would like me to incorporate.
Yeah. I have been experiencing several issues with DynamicHMC. I tried re-working the log likelihood function of the LBA to be more numerically stable, but it didn't really solve the problem. I think part of the issue might be related to implementational differences in HMC and how the model is initialized. Unfortunately, I am at an impasse with that particular issue.
I'll try to add that within the next week and update the benchmarks to reflect changes in Turing and AdvancedHMC. Please let me know if there are more optimizations you would like me to incorporate.
Thanks. Maybe you could watch https://github.com/TuringLang/TuringExamples/tree/master/benchmarks which contains the updated implementation using whatever new features or optimizations we have. In fact, maybe we should figure out a way to keep models in a single place and allow them to be easily reused in different places.
Note that our simple benchmark scripts only tests the plain computatational perfomrance (using static HMC) but MCMCBenchmarks.jl does much more.
I tried re-working the log likelihood function of the LBA to be more numerically stable, but it didn't really solve the problem. I think part of the issue might be related to implementational differences in HMC and how the model is initialized. Unfortunately, I am at an impasse with that particular issue.
IIRC the Turing and DynamicHMC version shares the same LBA related computation, so I guess there should be no issues there. It would be useful to check if initialization matters by using the same starting point to exclude the first guess.
Closing now since AHMC has many numerical stability improvements over the past year. Please reopen if more help is needed.
Hi-
While benchmarking various MCMC samplers in Julia, @Goedman and I have encountered a few cases in which Turing produces many numerical errors and/or fails to converge. Here are some examples for your consideration. I included model code below for ease of reading.
The first case in the Linear Ballistic Accumulator. Although the the chains converge in some cases, including the example below, there are usually between 5 to 10 numerical errors at the beginning. What we found in general is that increasing the sample size from 10 to 100 or more leads to lower effective sample (e.g. <200 out of 1,000 for k,tau v[1] and v[2]) and yields poorer convergence in many cases (1.02 to 1.08). It is not clear to me why this is happening or whether it is related to the numerical errors that happen sporadically. In either case, you might considering adding this to your validation routine because the inter-correlations between parameters make this model a challenging test case.
The second case is a hierarchical Poisson model based on Rob's Statistical Rethinking package. The code below illustrates a common problem in which there are numerical errors and poor convergence. The results do not appear to depend on the delta/target acceptance rate parameter.
Results:
Model Code: