Closed torfjelde closed 1 week ago
@sethaxen maybe you want to have a look at this
Looks good to me, but maybe we want to wait for @sethaxen
Is there a reason this PR doesn't also add is_monotonically_decreasing?
I did consider this, but AFAIK the only monotonically decreasing bijectors we have right now is Scale
with negative coefficients which will require runtime checks and thus made me hesitant (was going to raise an issue about this).
But it's probably worth it, so I'll add that too :+1:
Do we have any way to statically detect if a bijector is univariate?
Not at the moment, no.
All univariate bijections are strictly monotonic. So we could define
And because we don't, I'd prefer to make it all explicit so we end up with a method error / always return false
instead of silently doing something strange.
Ah okay so now I remember another reason why I was holding back on is_monotonically_decreasing
: AFAIK Scale
is the only monotonically decreasing function we can have, but how do we implement is_monotonically_decreasing
for ComposedFunction
?
The condition
is_monotonically_decreasing(f.inner) && is_monotonically_decreasing(f.outer)
won't be correct, e.g. Scale(-1)
and Scale(-1)
are both monotonically decreasing, but their composition is not.
EDIT: Though this is of course also an issue for is_monotonically_increasing
...
EDIT 2: Nvm, it all just boils down to
inner \ outer | inc | dec | other |
---|---|---|---|
inc | inc | dec | NA |
dec | dec | inc | NA |
other | NA | NA | NA |
I don't understand the table, but I believe it amounts to first checking that all bijectors are (elementwise) univariate with all(x -> is_monotonically_increasing(x) | is_monotonically_decreasing(x), bijectors)
and then checking that there are an odd number of decreasing bijectors with mapreduce(is_monotonically_decreasing, xor, bijectors)
.
My table is conveying the same idea, just on a per-composition-basis (since we're defining the method for ComposedFunction
) :)
But I've now added support for monotonically decreasing functions too + tests:)
I was trying to replicate a Markov Switching GARCH model and ran into wanting ordered
for a positively-constrained distriibutions. Hence, I sort of need this PR :grimacing: Any chance we could get it through?
@yebai @devmotion @sethaxen I believe this should be good to go
EDIT: Note the error in the tests has no relevance to this PR, so should probably just merge it. Though currently looking at whether this is reproducible.
I will merge this at the end of the day unless anyone else has any objections @devmotion @sethaxen :)
@torfjelde I haven't had a chance to check this directly myself, but does this do the right thing for products of heterogenous univariate distributions? It would be nice if a few cases were numerically checked via MC sampling. E.g. you could get exact MC draws with rejection sampling and then compare to MCMC draws and check expectations are similar using the MCSE.
Yeah was thinking the same; will do that :+1:
Aaaalrighty! Final got this thing working:)
Issue was that we added one too many transformations @sethaxen : should just be inverse(OrderedBijector()) ∘ b
to we get a transformation from constrained to real, not binv ∘ inverse(OrderedBijector()) ∘ b
, which takes us from constrained to constrained.
BUT one final thing: what should we put as a warning regarding usage of ordered
@sethaxen? I'm still a bit uncertain about exactly what you meant; I thought I understood wrt. restriction and not accounting for normalization constant, but then the example with changing normalization constant (variance parameter changing in a MvNormal
) seemed to work, so now I'm confused again :shrug:
Damn. Seems like we missed something in #313
Note that there doesn't seem to be anything incorrect with the impl, but it's failing because it's trying to compare elements which aren't part of the triangular part
Pfft well that was painful. Added comments regarding what the issue is + fixed it by introducing a wrapper to avoid comparing Matrix
values with potentially undef
entries.
Would you have a quick look at some point @sethaxen ? :pray: Think we're there now after we've addressed the following:)
BUT one final thing: what should we put as a warning regarding usage of ordered @sethaxen? I'm still a bit uncertain about exactly what you meant; I thought I understood wrt. restriction and not accounting for normalization constant, but then the example with changing normalization constant (variance parameter changing in a MvNormal) seemed to work, so now I'm confused again 🤷
Would you have a quick look at some point @sethaxen ? 🙏 Think we're there now after we've addressed the following:)
Cool will try to review this evening.
BUT one final thing: what should we put as a warning regarding usage of ordered @sethaxen? I'm still a bit uncertain about exactly what you meant; I thought I understood wrt. restriction and not accounting for normalization constant, but then the example with changing normalization constant (variance parameter changing in a MvNormal) seemed to work, so now I'm confused again 🤷
Sounds weird, I'll check the test.
Sounds weird, I'll check the test.
I was talking about the example that we were discussing in one of the other comments; specifically https://github.com/TuringLang/Bijectors.jl/pull/297#discussion_r1597708555
Ah, that's expected though. I assume you un-fixed the variance parameter and randomly sampled it within the rejection sampling inner loop? The issue here is that when the mean is the same for both components, then the variance actually has no impact on whether they are ordered. I think you should see a difference if you make the mean a reverse-ordered vector. The further the two mean components, the more pronounced the difference and the harder it is to rejection sample.
the example I talk about is not related to rejection sampling; I'm referring to the example you ran with NUTS
:)
I was talking about the example that we were discussing in one of the other comments; specifically #297 (comment)
the example I talk about is not related to rejection sampling; I'm referring to the example you ran with
NUTS
:)
I'm confused which example you're referring to then. The one in the comment you linked to compares NUTS with rejection sampling, but it does so with fixed variance, so it would not manifest the issue I'm talking about. Here's an example that does:
Note that the rejection sampling approach makes sense. The quantiles of the two variances should be about the same, since to get an ordered draw with a well-separated reverse-ordered mean, one needs to increase the variance, but it doesn't matter which variance is increased. But if we look at the HMC draws, we see that there's an asymmetry between the variances. This is due to the missing normalization factor. If we had a closed-form expression for it, we could test that, but I don't know one.
TBH I'm not certain if the above examples are even correct. The place I expect this to manifest is when conditioning. Which is implicitly what the rejection-sampling approach is doing (conditioning on x[1] > x[2]
).
I'm confused which example you're referring to then. The one in the comment you linked to compares NUTS with rejection sampling, but it does so with fixed variance, so it would not manifest the issue I'm talking about.
Completely missed the fact that we were fixing the variance :facepalm:
Note that the rejection sampling approach makes sense. The quantiles of the two variances should be about the same, since to get an ordered draw with a well-separated reverse-ordered mean, one needs to increase the variance, but it doesn't matter which variance is increased.
Gotcha, gotcha; understand better now :+1:
Soooo how do we summarize all this into a simple warning for the end-user? :eyes:
This PR is just waiting for the following:
Soooo how do we summarize all this into a simple warning for the end-user?
Think it's worth waiting until @sethaxen is back to let him have a final say before we merge :+1:
Soooo how do we summarize all this into a simple warning for the end-user? 👀
Maybe an admonition saying something like:
The resulting ordered distribution is un-normalized. This is not a problem if used in a context where the normalizing factor is irrelevant, but if the value of the normalizing factor impacts the resulting computation, the results may be inaccurate. For example, if the distribution is used in sampling a posterior distribution with MCMC and the parameters of the ordered distribution are themselves sampled, then the normalizing factor would in general be needed for accurate sampling, and
ordered
should not be used. However, if the parameters are fixed, then since MCMC does not require distributions be normalized,ordered
may be used without problems. A common case is where the distribution being ordered is a joint distribution ofn
identical univariate distributions. In this case the normalization factor works out to be the constantn!
, andordered
can again be used without problems even if the parameters of the univariate distribution are sampled.
Not in love with it; feels too wordy.
Not in love with it; feels too wordy.
Added it, but did broke it up a bit + added a shorter warning to the initial parts of the docstring:) Thanks @sethaxen !
Related: https://github.com/TuringLang/Bijectors.jl/issues/220 and https://github.com/TuringLang/Bijectors.jl/issues/295