TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.05k stars 219 forks source link

Gibbs gives different sampling results with fixed rng #1731

Open bvdmitri opened 2 years ago

bvdmitri commented 2 years ago

Hey, thanks for the nice package!

Recently I probably came across a bug in Turing.jl. I run a slightly modified version of HMM demo. I use the following inference function:

function inference_turing(observations; nsamples = 500, seed = 42)
    rng     = MersenneTwister(seed)
    sampler = Turing.Gibbs(Turing.HMC(0.1, 40, :A, :B), Turing.PG(010, :z))
    return Turing.sample(rng, BayesHmm(observations, 3), sampler, nsamples)
end

What I noticed is that results are always different though I fix my rng and seed. If I do

Random.seed!(seed)

in the beginning of inference_turing then results are consistent and are always the same. It makes me feel that Gibbs sampler ignores rng setting. It didn't happen to me before with HMC sampler for example.

devmotion commented 2 years ago

Which version of Turing do you use? IIRC in some version particle filter methods did not respect the user-provided RNG but this should be fixed.

bvdmitri commented 2 years ago

Hey, thanks for swift response. Here is a ] st output:

[6e4b80f9] BenchmarkTools v1.2.0
  [13f3f980] CairoMakie v0.6.6
  [a93c6f00] DataFrames v1.2.2
  [31c24e10] Distributions v0.25.31
  [634d3b9d] DrWatson v2.7.5
  [5789e2e9] FileIO v1.11.2
  [9fc3f58a] ForneyLab v0.11.3
  [033835bb] JLD2 v0.4.15
  [6fafb56a] Memoization v0.1.13
  [1a8c2f83] Query v1.0.0
  [a194aa59] ReactiveMP v1.1.0
  [37e2e3b7] ReverseDiff v1.10.0
  [df971d30] Rocket v1.3.18
  [fce5fe82] Turing v0.19.0
  [3a884ed6] UnPack v1.0.2
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random
devmotion commented 2 years ago

Hmm this is unfortunate, you use the latest version :smile: So it seems there's a bug and at some point we don't pass around the user-provided RNG but fall back to the default one.

bvdmitri commented 2 years ago

Would you like me to share full code so you can try to reproduce?

devmotion commented 2 years ago

Sure, then we can check easily if we managed to fix the problem.

bvdmitri commented 2 years ago

Packages:

using DrWatson
using CairoMakie # Plots related stuff
using Turing, MCMCChains, Distributions, LinearAlgebra, Random # Bayesian Inference packages
using BenchmarkTools, DataFrames, Query # Analysis tools

using ReverseDiff, Memoization

Turing.setadbackend(:reversediff)
Turing.setrdcache(true)

Data generation function:

normalise(x) = x ./ sum(x)
function generate_data(parameters)
    @unpack n, A, B, seed = parameters

    rng = MersenneTwister(seed)

    # Initial state
    z_0 = [1.0, 0.0, 0.0] 

    # one-hot encoding of the states and of the observations
    z = Vector{Vector{Float64}}(undef, n) 
    y = Vector{Vector{Float64}}(undef, n)

    z_prev = z_0

    for t in 1:n
        z[t] = random_vector(rng, Categorical(normalise(A * z_prev)))
        y[t] = random_vector(rng, Categorical(normalise(B * z[t])))
        z_prev = z[t]
    end

    return z, y
end

Some parameters for the model and data generation:

params = let
    # Seed for reproducability
    seed = 123

    # Number of samples in dataset
    n = 50

    # Transition probabilities (some transitions are impossible)
    A = [0.9 0.0 0.1; 0.1 0.9 0.0; 0.0 0.1 0.9]

    # Observation noise
    B = [0.0 0.05 0.9; 0.05 0.9 0.05; 0.9 0.05 0.05] 

    @strdict seed n A B
end;
z, y = generate_data(params);

Turing model

@model BayesHmm(y, K) = begin
    # Get observation length.
    N = length(y)

    # State sequence.
    z = Turing.tzeros(Int, N)

    # Transition matrix.
    A = Vector{Vector}(undef, K)

    # Observations model matrix.
    B = Vector{Vector}(undef, K)

    # Assign distributions to each element of the transition matrix and the
    # emission matrix.
    for i = 1:K
        A_c_prior = normalise(ones(K))
        B_c_prior = ones(K)
        B_c_prior[K - i + 1] = 10.0
        B_c_prior = normalise(B_c_prior)

        A[i] ~ Dirichlet(A_c_prior)
        B[i] ~ Dirichlet(B_c_prior)
    end

    # Observe each point of the input.
    z[1] ~ Categorical(K)
    y[1] ~ Categorical(vec(B[z[1]]))

    for i = 2:N
        z[i] ~ Categorical(vec(A[z[i - 1]]))
        y[i] ~ Categorical(vec(B[z[i]]))
    end
end;

Inference:

# I reduced number of samples and hyperparameters just to make it faster to reproduce
# In actual experiments I use more sample
function inference_turing(observations; nsamples = 10, seed = 42)
    # Random.seed!(seed)
    rng     = MersenneTwister(seed)
    sampler = Turing.Gibbs(Turing.HMC(0.1, 20, :A, :B), Turing.PG(10, :z))
    return Turing.sample(rng, BayesHmm(observations, 3), sampler, nsamples)
end

Running inference

z_turing_estimated = inference_turing(argmax.(y));

Let me know if I forgot smth, but that should be enough. Resulting chain from inference_turing always have different values on my side (withRandom.seed! commented out).

bvdmitri commented 2 years ago

@devmotion Just in case I interrupted a sampling process and have the following stacktrace, not sure if it is useful but I can see that in some functions it uses Random._GLOBAL_RNG object:

CTaskException:
InterruptException:
Stacktrace:
  [1] assume(rng::Random._GLOBAL_RNG, spl::DynamicPPL.Sampler{PG{(:z,), AdvancedPS.ResampleWithESSThreshold{typeof(AdvancedPS.resample_systematic), Float64}}}, dist::Categorical{Float64, Vector{Float64}}, vn::AbstractPPL.VarName{:z, Tuple{Tuple{Int64}}}, #unused#::DynamicPPL.TypedVarInfo{NamedTuple{(:A, :B, :z), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:A, Tuple{Tuple{Int64}}}, Int64}, Vector{Dirichlet{Float64, Vector{Float64}, Float64}}, Vector{AbstractPPL.VarName{:A, Tuple{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:B, Tuple{Tuple{Int64}}}, Int64}, Vector{Dirichlet{Float64, Vector{Float64}, Float64}}, Vector{AbstractPPL.VarName{:B, Tuple{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:z, Tuple{Tuple{Int64}}}, Int64}, Vector{Categorical{Float64, Vector{Float64}}}, Vector{AbstractPPL.VarName{:z, Tuple{Tuple{Int64}}}}, Vector{Int64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64})
    @ Turing.Inference ~/.julia/packages/Turing/nfMhU/src/inference/AdvancedSMC.jl:330
  [2] tilde_assume
    @ ~/.julia/packages/DynamicPPL/RcfQU/src/context_implementations.jl:63 [inlined]
  [3] tilde_assume
    @ ~/.julia/packages/DynamicPPL/RcfQU/src/context_implementations.jl:58 [inlined]
  [4] tilde_assume
    @ ~/.julia/packages/DynamicPPL/RcfQU/src/context_implementations.jl:43 [inlined]
  [5] tilde_assume!
    @ ~/.julia/packages/DynamicPPL/RcfQU/src/context_implementations.jl:140 [inlined]
  [6] BayesHmm(__model__::DynamicPPL.Model{typeof(BayesHmm), (:y, :K, :T), (:T,), (), Tuple{Vector{Int64}, Int64, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, __varinfo__::DynamicPPL.TypedVarInfo{NamedTuple{(:A, :B, :z), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:A, Tuple{Tuple{Int64}}}, Int64}, Vector{Dirichlet{Float64, Vector{Float64}, Float64}}, Vector{AbstractPPL.VarName{:A, Tuple{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:B, Tuple{Tuple{Int64}}}, Int64}, Vector{Dirichlet{Float64, Vector{Float64}, Float64}}, Vector{AbstractPPL.VarName{:B, Tuple{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:z, Tuple{Tuple{Int64}}}, Int64}, Vector{Categorical{Float64, Vector{Float64}}}, Vector{AbstractPPL.VarName{:z, Tuple{Tuple{Int64}}}}, Vector{Int64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, __context__::DynamicPPL.SamplingContext{DynamicPPL.Sampler{PG{(:z,), AdvancedPS.ResampleWithESSThreshold{typeof(AdvancedPS.resample_systematic), Float64}}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}, y::Vector{Int64}, K::Int64, ::Type{Float64})
devmotion commented 2 years ago

Yep, this seems to confirm my assumption that the problem is the particle filter method. I suspected the recent change in AdvancedPS to TracedRNG (and some default initialization there which is not handled in Turing) to be the problem but actually this change is not released yet. This makes it a bit more difficult to fix though if the problem is in AdvancedPS as the master branch there contains many other changes.

bvdmitri commented 2 years ago

Yeah, I hope you'll manage to fix that! It is not a very big issue for me personally since I always can do Random.seed! as a workaround in my experiments. Just wanted to let you know and wasn't sure if it is a problem on my side or it is a bug in Turing.