TuringLang / AdvancedHMC.jl

Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms
https://turinglang.org/AdvancedHMC.jl/
MIT License
228 stars 39 forks source link

Report percentage of divergent transitions in progress bar #297

Closed yebai closed 1 year ago

yebai commented 1 year ago

This PR limits warning messages to 10. Is that a good default?

Solution 1: we can probably disable these numerical messages, and instead print a summary message about total divergent transitions (see e.g. this comment) when the sample function is done.

A message about the percentage of divergent transitions is displayed in the progress bar. In addition, a warning message is shown if the percentage of divergent transitions is above a certain (25% by default) threshold.

jlperla commented 1 year ago

I would default to 0 and have people modify it themselves if they wish. Or have a way tot tell Turing that you want to reject a sample yourself without having to pass back -Inf - because in those cases logging isn't required since it was intentional.

yebai commented 1 year ago

@jlperla let us see what other people think. I'll leave this open for 2 more days for discussions.

yebai commented 1 year ago

I'll refine this PR a bit further when I got time -- I think it is better to print a summary message about total divergences when sampling finishes.

sethaxen commented 1 year ago

In terms of default, I think it's most important to give the user the most useful information to decide how to use their time. e.g. if I see nonstop numerical error during sampling and it doesn't seem to be getting better, I may or may not continue sampling, but I definitely immediately start inspecting my model more carefully for obvious mistakes. The problem with the current way of warning about numerical error is that it logs so much as to be a nuisance without really being informative.

If we just print 10 numerical error warnings, this really wouldn't be that useful, as it doesn't tell us anything about the model (10 numerical error warnings can occur during the initial step size adaptation of even a very well-behaved model); I'd go so far as to say this is no better than hiding all warnings. It would be more useful if the progress logging itself logged the number or percent of divergent transitions so far whenever an update is logged. This needn't take up any more terminal real estate than existing progress logging. And it would also be useful to differentiate number/percentage of pre-warmup divergences (which may be informative but are not problematic) from post-warmup divergences (which are). It sounds like PyMC and Numpyro do something like this: https://twitter.com/colindcarroll/status/1581334719823609856?s=20&t=E4qm64YiuQkt4kyKQYrdfQ

yebai commented 1 year ago

I implemented reporting the percentage of divergent transitions in the progress bar for both the AbstractMCMC.sample interface and the classical AHMC.sample interface, as suggested by @sethaxen. It is ready for another look.

yebai commented 1 year ago

@sethaxen can you take a look at this PR before it gets merged?

sethaxen commented 1 year ago

@sethaxen can you take a look at this PR before it gets merged?

Certainly! I'll do a proper review in the next few days, but from trying it out locally, I really like the extra statistics being shown during the logging. But I do have some general suggestions.

I am wondering though if is_adapt could maybe be a little more prominent. Or could information about the adaptation stage somehow be included in the logging? (e.g. for really slow chains, it's useful to know if one is still doing the initial step size tuning, or is one now adapting the mass matrix, and which window in Stan's windowed adaptation is one now in.)

RE percentage_divergent_transitions, this is nice, but is there some way after adaptation ends to only see the percent post-warmup divergent transitions? While during warm-up it's useful to know this quantity, divergences during warm-up are not serious if they don't persist afterward, and this doesn't tell us that.

In general the numbers are reported at much higher precision than is necessary or useful, which creates visual clutter. Could that precision be reduced? And RE the mass matrix, maybe it would be useful to just show its diagonal, not its type. And with lower precision numbers this would allow more of its entries to be visible (on my screen only 2 entries fit).

yebai commented 1 year ago

RE percentage_divergent_transitions, this is nice, but is there some way after adaptation ends to only see the percent post-warmup divergent transitions? While during warm-up it's useful to know this quantity, divergences during warm-up are not serious if they don't persist afterward, and this doesn't tell us that.

I added reporting for divergent transitions percentage for both warmup and post-warmup stages. They are reported separately.

I am wondering though if is_adapt could maybe be a little more prominent. Or could information about the adaptation stage somehow be included in the logging? (e.g. for really slow chains, it's useful to know if one is still doing the initial step size tuning, or is one now adapting the mass matrix, and which window in Stan's windowed adaptation is one now in.)

This is a low-hanging fruit, but better addressed in a separate PR. This PR is focused on reporting divergent transitions.