TuringLang / AbstractMCMC.jl

Abstract types and interfaces for Markov chain Monte Carlo methods
https://turinglang.org/AbstractMCMC.jl
MIT License
87 stars 18 forks source link

Implemented `thinning` option in `mcmcsample` #50

Closed luiarthur closed 3 years ago

luiarthur commented 3 years ago

I implemented a thinning option in mcmcsample so that a user can thin samples by a certain factor, in reference to #49.

I'm not sure what else to check besides that the number of samples returned is correct. I'd be happy add more tests and documentation as directed.

Also, I'm totally fine if this is rejected since @devmotion mentioned that there could be unintended effects on Turing-independent packages that rely on AbstractMCMC. It's just something that I and (I think other) people that use GibbsConditional and MH might find useful.

devmotion commented 3 years ago

Yes, the idea is to check that the samples 1, k + 1, 2 k + 1, ..., (nsamples - 1) k + 1 in the chain without thinning are the same as samples 1, 2, ..., nsamples in the chain with thinning = k.

cpfiffer commented 3 years ago

I have nothing to add on top of David's suggestions -- great work! I actually really like this in AbstractMCMC. It's exactly the kind of logic that can be centralized.

luiarthur commented 3 years ago

Thanks for the clarification, @devmotion. I've made the changes and added a test to ensure that a thinned chain is every k-th sample of a longer (un-thinned) chain.

devmotion commented 3 years ago

Looks good :+1: We just have to wait for https://github.com/TuringLang/AbstractMCMC.jl/pull/51 and then make sure that tests pass. Can you already bump the version to 2.2.0?

luiarthur commented 3 years ago

Thanks! Yes, just changed the version to 2.2.0 in Project.toml.

codecov[bot] commented 3 years ago

Codecov Report

Merging #50 (001c062) into master (b5ef33c) will increase coverage by 0.18%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #50      +/-   ##
==========================================
+ Coverage   97.18%   97.36%   +0.18%     
==========================================
  Files           6        6              
  Lines         142      152      +10     
==========================================
+ Hits          138      148      +10     
  Misses          4        4              
Impacted Files Coverage Δ
src/sample.jl 99.01% <100.00%> (+0.10%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dbd18fe...001c062. Read the comment docs.

devmotion commented 3 years ago

There seems to be a problem with the progress bars, e.g., the logs show progress bars that go up to 200%.

devmotion commented 3 years ago

The progress bars are still incorrect.

devmotion commented 3 years ago

Maybe there's something strange going on with the @logprogress macro here. Can you try to replace all occurrences of

progress && @logprogress (itotal += 1) / Ntotal

with

if progress
    itotal += 1
    @logprogress itotal / Ntotal
end

?

luiarthur commented 3 years ago

Yes, I'll try that.

luiarthur commented 3 years ago

I apologize for not catching this. When I ran the test in the julia repl, the progress bar advanced so quickly that I only see the final state, which for some reason always shows up as 100%, though before that it does advance to above 100%...

Anyway, your suggestion to replace

progress && @logprogress (itotal += 1) / Ntotal

with

if progress
    itotal += 1
    @logprogress itotal / Ntotal
end

fixed the problem.

But it seems that both:

Ntotal = thinning * (N - 1) + discard_initial + 1

and

Ntotal = thinning * N + discard_initial

work.

I haven't committed anything yet. But I thought about this a bit more, and it seems the latter should be correct. Any thoughts?

devmotion commented 3 years ago

I haven't committed anything yet. But I thought about this a bit more, and it seems the latter should be correct. Any thoughts?

The latter is not correct: we generate discard_initial samples that are discarded, then one sample that is saved, and afterwords we run the for-loop N - 1 times and generate thinning samples in each run. So in total discard_initial + 1 + (N - 1) * thinning samples are generated.

However, the difference is not noticeable in our tests: the difference between both definitions of Ntotal is thinning - 1 (less in the correct case). I.e., in all tests with a default value of thinning = 1, they yield exactly the same Ntotal value, and in the tests with thinning we use thinning = 3, i.e., the difference is 2. Since we generate 1000 samples, this difference is not noticeable.

luiarthur commented 3 years ago

Yes. Thanks for the correction. Just pushed the correction.

luiarthur commented 3 years ago

Awesome, thanks! Looks like #51 was merged. Do I now merge this PR?

devmotion commented 3 years ago

If tests pass, you can merge it. Otherwise, if you do not have sufficient permissions, @cpfiffer or I will merge the PR once tests have passed.