Addition of `step_warmup`

torfjelde commented 1 year ago

For many samplers, it might be useful to separate between the warmup phase and the sampling phase, e.g. in AdvancedHMC we have an initial phase where we adapt the parameters to the parameters at hand.

Currently, the usual approach to implementing such a warmup stage is to keep track of the iteration + the adaptation stuff internally in the state, but sometimes that can be quite annoying and/or redundant to implement.

I would also argue that separating these is useful on a conceptual level, e.g. even if I interact directly with the stepper-interface, I would now do

for _ = 1:nwarmup
    state = last(AbstractMCMC.step_warmup(rng, model, sampler, state))
end

for i = 1:nsteps
    transition, state = AbstractMCMC.step(rng, model, sampler, state)
    # save
    ...
end

vs.

for i = 1:nwarmup + nsteps
    transition, state = AbstractMCMC.step(rng, model, sampler, state)
    # save
    if !iswarmup(state)
        ...
    end
end

With this PR, for something like MCMCTempering.jl where in the warmup phase we actually just want to take steps with the underlying sampler rather than also include the swaps, we can then just make the step_warmup do so without having to add any notion of iterations in the state, nor without telling the sampler itself about how many warm-up steps we want (instead it's just specified by kwarg in sample, as it should be).

codecov[bot] commented 1 year ago

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (dfb33b5) 96.87% compared to head (6e8f88e) 92.87%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #117 +/- ## ========================================== - Coverage 96.87% 92.87% -4.00% ========================================== Files 8 8 Lines 320 351 +31 ========================================== + Hits 310 326 +16 - Misses 10 25 +15 ``` | [Files](https://app.codecov.io/gh/TuringLang/AbstractMCMC.jl/pull/117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang) | Coverage Δ | | |---|---|---| | [src/interface.jl](https://app.codecov.io/gh/TuringLang/AbstractMCMC.jl/pull/117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang#diff-c3JjL2ludGVyZmFjZS5qbA==) | `84.00% <0.00%> (-11.46%)` | :arrow_down: | | [src/sample.jl](https://app.codecov.io/gh/TuringLang/AbstractMCMC.jl/pull/117?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang#diff-c3JjL3NhbXBsZS5qbA==) | `90.99% <75.51%> (-4.89%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

torfjelde commented 1 year ago

I think, as the HMC code in Turing, this conflates warmup stages for the sampler with discarding initial samples. In the first case, (usually) you also want to discard these samples but you might even want to discard more samples even after tuning hyperparameters of a sampler.

Oh I definitively agree with this. IMO this wouldn't be a catch-all solution, and you could still argue that in the case of HMC we should stick to the current approach.

But in many cases this sort of behavior would indeed be what the user expects (though I agree with you, I also don't want to remove the allowance of the current discard_initial behavior).

Is it worth introducing a new keyword argument then? Something that is separate from discard_initial, allowing you to define a "burn-in" period, and then a separate num_warmup that has this potentially special behavior?

I also wonder a bit whether AbstractMCMC is the right level for such an abstraction. Or whether, eg it could be done in AdvancedHMC.

I don't think we should deal with the adaptation, etc. itself in AbstractMCMC, but there are sooo many samplers that have some form of initial adaptation that it's IMO worth providing a simple hook that let's people do this.

devmotion commented 1 year ago

Is it worth introducing a new keyword argument then? Something that is separate from discard_initial, allowing you to define a "burn-in" period, and then a separate num_warmup that has this potentially special behavior?

Yes, I think we should keep these options separate. I wonder if discard_initial should apply to both these warmup stages and the potential burn-in period to be able to keep warm-up samples as well, if desired. Or are we absolutely certain that you would never want to inspect these samples?

torfjelde commented 1 year ago

Yes, I think we should keep these options separate. I wonder if discard_initial should apply to both these warmup stages and the potential burn-in period to be able to keep warm-up samples as well, if desired. Or are we absolutely certain that you would never want to inspect these samples?

No, I agree with you there; sometimes it's nice to keep them.

So are we thinking discard_initial = num_warmup with num_warmup = 0 by default?

devmotion commented 1 year ago

Yes, I think these would be reasonable default values.

torfjelde commented 1 year ago

Doneso :+1:

Nvm, I forgot we wanted to allow potentially keeping the warmup-samples around..

torfjelde commented 1 year ago

Aight, I've made an attempt at allowing the desired interaction between discard_initial and num_warmup, but it does complicate the mcmcsample a fair bit :confused:

I've also added some docstring for mcmcsample. IMO this should be in the documentation as it specifies the default kwargs that will work with all implementers of step. Currently there's no way to figure out that discard_initial is a thing.

devmotion commented 1 year ago

IMO this should be in the documentation

The standard keyword arguments are listed and explained in the documentation: https://turinglang.org/AbstractMCMC.jl/dev/api/#Common-keyword-arguments

torfjelde commented 1 year ago

Would you mind having another look @devmotion ?:)

torfjelde commented 1 year ago

Also, should I bump the minor version (currently bumped the patch version, which seems incorrect)?

devmotion commented 10 months ago

@torfjelde Can you fix the merge conflict?

torfjelde commented 10 months ago

Should be good to go now @devmotion

torfjelde commented 10 months ago

But we're having issues with nightly on x86 so these two last ones won't finish.

torfjelde commented 3 months ago

Ran into another place where this will be useful: https://github.com/torfjelde/AutomaticMALA.jl. Avoids having to put num_adapts in the sampler itself, and can be replaced with num_warmup as a kwarg.

(I need to do some more work on this PR; had sort of forgotten about this)

TuringLang / AbstractMCMC.jl

Addition of `step_warmup` #117

Codecov Report