TuringLang / AbstractMCMC.jl

Abstract types and interfaces for Markov chain Monte Carlo methods
https://turinglang.org/AbstractMCMC.jl
MIT License
78 stars 18 forks source link

Fix `discard_initial`, and add support for `discard_initial` and `thinning` to iterator and transducer #102

Closed devmotion closed 2 years ago

devmotion commented 2 years ago

This PR fixes a bug in discard_initial. Additionally, it adds support for discard_initial and thinning to the iterator and transducer.

Of course, it would be possible to achieve the same behaviour in the iterator/transducer by using Iterators.drop/Transducers.Drop etc. but I think support for these two keyword arguments improves the user experience since it is more consistent with sample and arguably simpler.

The PR also adds tests for these two keyword arguments.

codecov[bot] commented 2 years ago

Codecov Report

Merging #102 (6aff4f1) into master (650d9e1) will increase coverage by 0.29%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #102      +/-   ##
==========================================
+ Coverage   97.48%   97.78%   +0.29%     
==========================================
  Files           7        7              
  Lines         239      271      +32     
==========================================
+ Hits          233      265      +32     
  Misses          6        6              
Impacted Files Coverage Δ
src/sample.jl 97.66% <100.00%> (ø)
src/stepper.jl 100.00% <100.00%> (ø)
src/transducer.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 650d9e1...6aff4f1. Read the comment docs.

devmotion commented 2 years ago

The test errors are quite puzzling. It seems changing the reference sampling to progress=true fixes them locally with Julia 1.3. I don't know why enabling progress logs would affect the samples on Julia 1.3.

cpfiffer commented 2 years ago

Yeah the reason for failure here is really unclear. Can you print out the full chains for the two chains with something like 5 samples or something? Maybe it'll be obvious to us if there's some kind of index arithmetic error.

devmotion commented 2 years ago

I think it could be related to some changes in Julia (logging? the thread-local rngs?) since tests fail on Julia 1.3 but not on more recent versions.

The PR adds some tests that checks that the bug with discard_initial is fixed. E.g., with the master branch currently one gets the following on Julia 1.7 since not discard_initial but only max(0, discard_initial - 1) samples are discarded:

julia> using AbstractMCMC

julia> using Random: Random, AbstractRNG

julia> using Test

julia> include(joinpath(dirname(dirname(pathof(AbstractMCMC))), "test", "utils.jl"))

julia> Random.seed!(1234);

julia> N = 100;

julia> discard_initial = 50;

julia> chain = sample(MyModel(), MySampler(), N; discard_initial=discard_initial);

julia> @test length(chain) == N
Test Passed
  Expression: length(chain) == N
   Evaluated: 100 == 100

julia> @test !ismissing(chain[1].a)
Test Passed
  Expression: !(ismissing((chain[1]).a))

julia> # Repeat sampling without discarding initial samples.
       Random.seed!(1234);

julia> ref_chain = sample(MyModel(), MySampler(), N + discard_initial; progress=false);

julia> [chain[i].a for i in 1:5]
5-element Vector{Float64}:
 0.9327827434839276
 0.4122251300478674
 0.7553435356857371
 0.7091580778375822
 0.18668155849545398

julia> [ref_chain[i + discard_initial].a for i in 1:5]
5-element Vector{Float64}:
 0.4122251300478674
 0.7553435356857371
 0.7091580778375822
 0.18668155849545398
 0.6932923170086805

Whereas with this PR one gets on Julia 1.7:

...

julia> [chain[i].a for i in 1:5]
5-element Vector{Float64}:
 0.4122251300478674
 0.7553435356857371
 0.7091580778375822
 0.18668155849545398
 0.6932923170086805

julia> [ref_chain[i + discard_initial].a for i in 1:5]
5-element Vector{Float64}:
 0.4122251300478674
 0.7553435356857371
 0.7091580778375822
 0.18668155849545398
 0.6932923170086805

However, on Julia 1.3 it seems completely messed up (this PR):

...

julia> [chain[i].a for i in 1:5]
5-element Array{Float64,1}:
 0.9906023534355743 
 0.10101494660193899
 0.43508662726114   
 0.8837184408191245 
 0.07624588897736362

julia> [ref_chain[i + discard_initial].a for i in 1:5]
5-element Array{Float64,1}:
 0.26559536159694797
 0.6126276964747348 
 0.5083634348868193 
 0.6125537989226799 
 0.8510698470611828 

However, enabling the progress logging for the reference trajectory (i.e., using the same setting for the chain and the reference) fixes the problem:

...

julia> Random.seed!(1234);

julia> ref_chain = sample(MyModel(), MySampler(), N + discard_initial; progress=true);

julia> [ref_chain[i + discard_initial].a for i in 1:5]
5-element Array{Float64,1}:
 0.9906023534355743 
 0.10101494660193899
 0.43508662726114   
 0.8837184408191245 
 0.07624588897736362

It's still mysterious to me why and how progress logging interferes with the RNG, and why this happens only on older Julia versions.

devmotion commented 2 years ago

OK, even more surprising - the problem can be fixed on Julia 1.3 by specifying an RNG explicitly, even if the reference trajectory is sampled with progress=false:

...

julia> rng = Random.MersenneTwister(1234);

julia> chain = sample(rng, MyModel(), MySampler(), N; discard_initial=discard_initial);

julia> rng = Random.MersenneTwister(1234);

julia> ref_chain = sample(rng, MyModel(), MySampler(), N + discard_initial; progress=false);

julia> [chain[i].a for i in 1:5]
5-element Array{Float64,1}:
 0.26559536159694797
 0.6126276964747348 
 0.5083634348868193 
 0.6125537989226799 
 0.8510698470611828 

julia> [ref_chain[i + discard_initial].a for i in 1:5]
5-element Array{Float64,1}:
 0.26559536159694797
 0.6126276964747348 
 0.5083634348868193 
 0.6125537989226799 
 0.8510698470611828 

whereas

...

julia> Random.seed!(1234);

julia> chain = sample(Random.GLOBAL_RNG, MyModel(), MySampler(), N; discard_initial=discard_initial);

julia> Random.seed!(1234);

julia> ref_chain = sample(Random.GLOBAL_RNG, MyModel(), MySampler(), N + discard_initial; progress=false);

julia> [chain[i].a for i in 1:5]
5-element Array{Float64,1}:
 0.9906023534355743 
 0.10101494660193899
 0.43508662726114   
 0.8837184408191245 
 0.07624588897736362

julia> [ref_chain[i + discard_initial].a for i in 1:5]
5-element Array{Float64,1}:
 0.26559536159694797
 0.6126276964747348 
 0.5083634348868193 
 0.6125537989226799 
 0.8510698470611828 

So on Julia 1.3 something mutates the global RNG when progress logging is enabled - but not specifically the user-provided RNG.

devmotion commented 2 years ago

OK, so the @withprogress macro creates a unique ID with uuid4() (https://github.com/JuliaLogging/ProgressLogging.jl/blob/0e7933005233722d6214b0debe3316c82b4d14a7/src/ProgressLogging.jl#L324) but up until Julia 1.6 (exclusive) uuid4 uses Random.GLOBAL_RNG if no rng is specified (see https://github.com/JuliaLang/julia/blob/bb5b98e72a151c41471d8cc14cacb495d647fb7f/stdlib/UUIDs/src/UUIDs.jl#L97-L98 and e.g. https://github.com/JuliaLang/julia/blob/2d5741174ce3e6a394010d2e470e4269ca54607f/stdlib/UUIDs/src/UUIDs.jl#L86).

So I think in general we can't avoid the mutation of the default rng on older Julia versions, and since it is an implementation detail of uuid4 and @withprogress the problem might re-appear in the future. I wonder if we should just use progress=true in the reference chains in the tests and be fine with it, or if there is anything from our side that we should fix (e.g., by not using Random.GLOBAL_RNG/Random.default_rng() but an independent copy, by generating some custom IDs that either don't interfere with the provided rng or always mutate the provided rng in a reproducible way, e.g. by always calling uuid4(rng), regardless of whether a progress bar is shown or not.).

cpfiffer commented 2 years ago

This sounds like we should just set progress=true. I'm not particularly inclined to modify code to suit pre-1.6, given that 1.6 is the current LTS. Alternatively we could even just drop 1.3 testing entirely?

devmotion commented 2 years ago

This sounds like we should just set progress=true.

OK, I did that.

I'm not particularly inclined to modify code to suit pre-1.6, given that 1.6 is the current LTS.

I'm not so much worried about < 1.6 but rather that the issue might reappear due to some internal changes in UUIDs or ProgressLogging. But I guess changing the tests is fine for now and we can reconsider if there's an issue with more recent Julia versions as well.

Alternatively we could even just drop 1.3 testing entirely?

In my experience it's easy to introduce unsupported syntax or functions and thereby break older Julia versions if they are not tested. Hence I tend to test the oldest officially declared to be compatible version, LTS, and the latest stable release.

devmotion commented 2 years ago

I don't understand the test error with Julia nightly on Linux 32bit - it works in all other Julia versions and is caused by a functionality (serial sampling) that is not affected by this PR. I'll ignore it for now.

Maybe we should just stop testing 32bit, I'm not sure how much it is used anyway - and there's not really any 32bit/64bit-specific code here and they take quite long to run (at least with Julia nightly).

cpfiffer commented 2 years ago

Thank you!