Open torfjelde opened 11 months ago
Thank you!
Here is an explanation for that temporary situation: one thing we really like with Turing is the ability to support models containing both discrete and continuous variables. To support this, part of the functionality in Pigeons allows the state to be tuple-like, where each state's key mapping to a potentially different data type. Let's call this situation the "type heterogeneous case".
But for the first pass of writing gradient based samplers, it was convenient to begin by assuming type homogeneity. We encode that situation with the tuple-like state having a single key called :singleton_variable.
Clearly it should be possible to use gradient-based sampling in the type heterogeneous case. We might just need some guidance on some details of the Turing API to achieve this. I forgot exactly what was the tumbling block but it did not seem serious. But for now since the slice sampler handles the type heterogeneous case, we left it as default for now (on a related note, we would also be interested in getting Turing's samplers, in particular the SMC based samplers, to work automatically in the tempering case, again we might just need a bit of guidance on the API to achieve this).
PS: to give a fuller picture, the state interface can handle different levels of abstraction, but I am focussing here on the level of abstraction relevant to autoMALA, HMC and similar samplers.
Ah, I see that makes sense! :+1:
I also noticed:
and
Could you elaborate a bit on the former?
For the latter, that sounds like a strange error tbh, but unfortunately the logs are no longer available so I can't really look into it :confused:
we would also be interested in getting Turing's samplers, in particular the SMC based samplers, to work automatically in the tempering case, again we might just need a bit of guidance on the API to achieve this
Uncertain how useful this will be to tbh. SMC samplers in Turing.jl are computationally very inefficient due to the nature of their implementation.
Nonetheless, I'd be very happy to help:) Would be very nice to have this easily accessible and working in Turing.jl.
Certainly, happy to elaborate on these!
For the first blurb, here is the difficulty that I had encountered: I was using DynamicPPL.getall
, setall!
and ADgradient
however when the model is mixed discrete-continuous, this includes the discrete variables. I was wondering if there an equivalent to getall/setall!/ADgradient but based on a view on only the continuous variables?
For the second one, I will rerun the test, hopefully we can reproduce the non-reproducibility! I will keep you updated...
I managed to replicate the non-reproducibility issue I mentioned in #165! This time I am attaching the logs for posterity since the CI seems to erase them after some time period.
Here is the background:
The test in question is this one.
I am attaching the logs below for two sister CI runs. They only differ in that one use a different MPI library, and since this specific test does not use MPI, consider them as two independent runs. If you search the logs for "Starting test_turing.jl" you will find a table under, look for the column min(αₑ)
(minimum MALA accept pr across chains), you will find last column is 0.504 vs 0.518.
The problem does not arise if the slice sampler is used with a Turing model (table immediately after), or if autoMALA is used on pure Julia or Stan model. So I suspect the non-determinism is related to gradient computation on Turing models.
Some additional info:
1.8-mac-mpich-8_Run julia-actionsjulia-runtest@v1.txt 1.8-mac-openmpi-8_Run julia-actionsjulia-runtest@v1.txt
Sorry for the late reply; was awol for one one week and then sick the next..
But this is great; thank you!
So I suspect the non-determinism is related to gradient computation on Turing models.
Hmm, if this is the issue then there must be something with how it's set up in Pigeons.jl or something, because the model is fully reproducible on current Turing.jl, i.e. if I run NUTS with the same random seed multiple times on the exact model you pointed to :confused:
I'll have a look.
So I suspect the non-determinism is related to gradient computation on Turing models.
Have you observed this phenomenon concretely btw? Non-determinacy of the gradient computation I mean? Or is this just a suspicion?
So when I run the exact tests from #165 locally, the results are perfectly reproducible (just running that testset twice results in exactly the same values everywhere).
julia> using Test, Pigeons, Turing
julia> @testset "Turing-gradient" begin
target = Pigeons.toy_turing_unid_target()
@show Threads.nthreads()
logz_mala = Pigeons.stepping_stone_pair(pigeons(; target, explorer = AutoMALA(preconditioner = Pigeons.IdentityPreconditioner())))
logz_slicer = Pigeons.stepping_stone_pair(pigeons(; target, explorer = SliceSampler()))
@test abs(logz_mala[1] - logz_slicer[1]) < 0.1
end
Threads.nthreads() = 1
┌ Info: Neither traces, disk, nor online recorders included.
│ You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└ To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
──────────────────────────────────────────────────────────────────────────────────────────────────
scans Λ time(s) allc(B) log(Z₁/Z₀) min(α) mean(α) min(αₑ) mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
2 3.4 0.0083 7.09e+06 -3.32e+03 0 0.622 0 0.539
4 2.22 0.00967 7.69e+06 -1.48e+03 0 0.753 0.668 0.716
8 2.62 0.0212 1.72e+07 -42.8 8.09e-30 0.709 0.465 0.62
16 2.91 0.0662 3.79e+07 -10.8 0.077 0.677 0.439 0.606
32 3.29 0.0994 7.82e+07 -11.8 0.128 0.635 0.528 0.628
64 3.27 0.224 1.59e+08 -11.1 0.209 0.637 0.529 0.624
128 3.51 0.474 3.36e+08 -11.4 0.508 0.61 0.53 0.621
256 3.57 0.922 6.84e+08 -11.9 0.475 0.604 0.519 0.624
512 3.46 1.86 1.37e+09 -11.5 0.582 0.615 0.517 0.604
1.02e+03 3.52 3.72 2.77e+09 -11.9 0.571 0.609 0.517 0.629
──────────────────────────────────────────────────────────────────────────────────────────────────
┌ Info: Neither traces, disk, nor online recorders included.
│ You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└ To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
──────────────────────────────────────────────────────────────────────────────────────────────────
scans Λ time(s) allc(B) log(Z₁/Z₀) min(α) mean(α) min(αₑ) mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
2 1.04 0.00155 1.04e+06 -4.24e+03 0 0.885 1 1
4 4.06 0.00233 1.78e+06 -16.3 4.63e-06 0.549 1 1
8 3.49 0.00428 3.52e+06 -12.1 0.215 0.612 1 1
16 2.68 0.00919 7.4e+06 -10.2 0.518 0.703 1 1
32 4.29 0.0165 1.36e+07 -11.8 0.222 0.524 1 1
64 3.17 0.0366 2.84e+07 -11.5 0.529 0.648 1 1
128 3.56 0.0863 5.49e+07 -11.5 0.523 0.605 1 1
256 3.38 0.154 1.1e+08 -11.6 0.526 0.625 1 1
512 3.48 0.292 2.21e+08 -12 0.527 0.614 1 1
1.02e+03 3.55 0.611 4.43e+08 -11.8 0.571 0.605 1 1
──────────────────────────────────────────────────────────────────────────────────────────────────
Test Summary: | Pass Total Time
Turing-gradient | 1 1 8.7s
Test.DefaultTestSet("Turing-gradient", Any[], 1, false, false, true, 1.700434095336861e9, 1.700434103988216e9, false)
julia> @testset "Turing-gradient" begin
target = Pigeons.toy_turing_unid_target()
@show Threads.nthreads()
logz_mala = Pigeons.stepping_stone_pair(pigeons(; target, explorer = AutoMALA(preconditioner = Pigeons.IdentityPreconditioner())))
logz_slicer = Pigeons.stepping_stone_pair(pigeons(; target, explorer = SliceSampler()))
@test abs(logz_mala[1] - logz_slicer[1]) < 0.1
end
Threads.nthreads() = 1
┌ Info: Neither traces, disk, nor online recorders included.
│ You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└ To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
──────────────────────────────────────────────────────────────────────────────────────────────────
scans Λ time(s) allc(B) log(Z₁/Z₀) min(α) mean(α) min(αₑ) mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
2 3.4 0.00789 7.09e+06 -3.32e+03 0 0.622 0 0.539
4 2.22 0.0101 7.69e+06 -1.48e+03 0 0.753 0.668 0.716
8 2.62 0.0234 1.72e+07 -42.8 8.09e-30 0.709 0.465 0.62
16 2.91 0.0502 3.79e+07 -10.8 0.077 0.677 0.439 0.606
32 3.29 0.117 7.82e+07 -11.8 0.128 0.635 0.528 0.628
64 3.27 0.222 1.59e+08 -11.1 0.209 0.637 0.529 0.624
128 3.51 0.45 3.36e+08 -11.4 0.508 0.61 0.53 0.621
256 3.57 0.933 6.84e+08 -11.9 0.475 0.604 0.519 0.624
512 3.46 1.89 1.37e+09 -11.5 0.582 0.615 0.517 0.604
1.02e+03 3.52 3.77 2.77e+09 -11.9 0.571 0.609 0.517 0.629
──────────────────────────────────────────────────────────────────────────────────────────────────
┌ Info: Neither traces, disk, nor online recorders included.
│ You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└ To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
──────────────────────────────────────────────────────────────────────────────────────────────────
scans Λ time(s) allc(B) log(Z₁/Z₀) min(α) mean(α) min(αₑ) mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
2 1.04 0.00121 1.04e+06 -4.24e+03 0 0.885 1 1
4 4.06 0.00233 1.78e+06 -16.3 4.63e-06 0.549 1 1
8 3.49 0.00465 3.52e+06 -12.1 0.215 0.612 1 1
16 2.68 0.00969 7.4e+06 -10.2 0.518 0.703 1 1
32 4.29 0.0171 1.36e+07 -11.8 0.222 0.524 1 1
64 3.17 0.0368 2.84e+07 -11.5 0.529 0.648 1 1
128 3.56 0.0714 5.49e+07 -11.5 0.523 0.605 1 1
256 3.38 0.156 1.1e+08 -11.6 0.526 0.625 1 1
512 3.48 0.309 2.21e+08 -12 0.527 0.614 1 1
1.02e+03 3.55 0.598 4.43e+08 -11.8 0.571 0.605 1 1
──────────────────────────────────────────────────────────────────────────────────────────────────
Test Summary: | Pass Total Time
Turing-gradient | 1 1 10.0s
Test.DefaultTestSet("Turing-gradient", Any[], 1, false, false, true, 1.700434107621442e9, 1.70043411758343e9, false)
Is there a possibility that usage of a different MPI version affect the rng somehow? Seems quite strange, but I don't have too much experience with MPI.
Thanks for checking! Yes, we did observed the non-reproducibility, this first occurred when CI checks were non-deterministically failing.
Regarding the MPI hypothesis, I would be quite surprised if the MPI implementation would affect the RNGs. MPI should not be aware of rngs.
What we often observed in non-reproducibility issues is that things might appear reproducible in one computing setup but not in another. E.g. if a race condition depends on timing of events it might only trigger in certain setups. Here I agree it seems to only show up on the CI instances (see logs saved above). We like to have reproducibility in the CI instances because we rely on it to check a property we call "parallelism invariance" on our distributed algorithms (https://pigeons.run/dev/distributed/#distributed). But we can always rely on other PPLs (or non gradient Turing) to check that property on the core distributed algorithms so this may not be necessary to narrow down this tricky quirk.
Sorry for the very late reply here. Conference + Christmas holidays happened + I've been working on a convenient way to represent mixing of variable types as you mentioned, and wanted to have that done before I replied but that's been taking much longer time than originally intended so will have to defer that for now.
Regarding the MPI hypothesis, I would be quite surprised if the MPI implementation would affect the RNGs. MPI should not be aware of rngs.
Very much agree; that would seem very surprising.
So I suspect the non-determinism is related to gradient computation on Turing models.
Are you constructing a separate model for each process? As in, is
called for each worker?
No worries, I have been slow in everything lately too!! :)
Good question! For this specific test, it is single threaded. But if it would have been multi-threaded, then the way it is setup at the moment is to have each replica having a distinct VarInfo, but the model is shared by several threads. I assumed the mutability happens in VarInfo's and not in models. I guess it's orthogonal to this issue, but I am curious if this is the right mental model?
Whoops, completely missed the reply! Just came across this now because I was just trying out Pigeons.jl for a problem I had of my own and figured I'd check back no this issue.
But if it would have been multi-threaded, then the way it is setup at the moment is to have each replica having a distinct VarInfo, but the model is shared by several threads. I assumed the mutability happens in VarInfo's and not in models.
Mutation shouldn't happen in the model unless arguments passed to the model itself are mutating, e.g. passing in missing
in an array will lead to it being sampled rather than "observed". So your understanding is indeed correct:)
Hmm, any ideas of how to best go about debugging this? We on the Turing side are pretty keen to help out with this but (at least I) am lacking in knowledge when it comes to how all of the moving parts here interact :confused: Maybe @devmotion or @yebai have any thoughts / ideas?
Hola amigos!
I came across the AutoMALA paper (really neat stuff) and wanted to have a go at it with some Turing.jl models. I came across this:
https://github.com/Julia-Tempering/Pigeons.jl/blob/3c3b87776e8c9a9be1594972546137d0ccc41cc9/src/targets/TuringLogPotential.jl#L38-L40
What does this
:singleton_variable structure
refer to?Thanks!