TuringLang / Bijectors.jl

Implementation of normalising flows and constrained random variable transformations
https://turinglang.org/Bijectors.jl/
MIT License
200 stars 33 forks source link

[RFC] Add angle bijector #168

Closed sethaxen closed 1 year ago

sethaxen commented 3 years ago

cc @trappmartin @yebai

As discussed on slack (and in #58), this PR adds an Angle "bijector". Given an angle θ ∈ [-π,π], the bijector maps to the corresponding point on the surface of the circle and then embeds the point in the 2D plane. That map is bijective with logdetjac of 0. Its left-inverse maps from the 2D plane to the circle via normalization (non-bijective) and then bijectively maps to the angle. Normalization has a singular Jacobian, so the logdetjac is infinite. However, we do know what adjustment to the logpdf is necessary to sample in the unconstrained plane instead, because it's the same as drawing a bivariate normally distributed random variable and then normalizing it to get a uniformly distributed point on the surface of the circle. So the logpdf is that of the bivariate normal distribution. See https://mc-stan.org/docs/2_26/reference-manual/unit-vector-section.html for details.

Some issues I ran into:

  1. test_bijector appears to assume a real Jacobian is used and checks it with AD. Here a real Jacobian is not used.
  2. The map is from Real to Vector{<:Real} of length 2. Hence, while VonMises is a UnivariateDistribution, transform(VonMises(0.5)) is not. TransformedDistribution seems to force it to be, which causes methods like logpdf(transform(VonMises(0.5)), randn(2) to fail.
  3. It's not clear to me what overloads are missing for this to work in Turing. In particular, since this is breaking a few rules, perhaps some of the default overloads are inappropriate?
sethaxen commented 3 years ago

Looks good to me! I’m not sure why some CIs are broken.

Oh, this actually does not currently work. As noted in the top comment, I'm not even certain how to test it, so no tests have been added. And I don't know what needs to be added for it to be used with Turing.

torfjelde commented 3 years ago

As you point out, this does require some changes to how we handle things in Bijectors.jl. The main pain point is that Bijectors.jl is very much built around the idea that the transformations are bijective in a "simple" sense, i.e. Real cannot be transformed to Vector{Real} (in fact, we use "dimensionality", i.e. Bijector{N} where N denotes the dimensionality of the array it acts on, with N = 0 corresponding to Real). This is another issue with the fact that AngleBijector <: Bijector{0}; it will have almost no composability with other bijectors because compositions require the spaces to be working on the same dimensionality. This is not enforced just for the sake of it btw! The reason why we have taken this approach is because it completely disambiguates whether or not an input is a batch of inputs or just a single input which is completely impossible otherwise.

Ideally we'd instead have something like Bijector{InSpace, OutSpace} but I'm worried it's going to be "too general" to be worth it wrt. added complexity both for users and devs.

IMO the way to go about this is to figure out exactly what we need to change to accomodate non-bijective transformations (preferably continue the discussion in the issue you reference), then we make those, and then this PR follows. It might be possible to add a very hacky support for non-bijective transformations by just relaxing the requirements on TransformedDistributions but this will then require specific implementations of constructors for these non-bijective distributions.

mohamed82008 commented 3 years ago

Ideally we'd instead have something like Bijector{InSpace, OutSpace} but I'm worried it's going to be "too general" to be worth it wrt. added complexity both for users and devs.

It might make sense to have a Transformation{InSpace, OutSpace} and then have Bijector{0} <: Transformation{0, 0}. Then we don't need to change most of the existing code, but we can add new branches to the type tree by subtyping Transformation instead of Bijector.

mohamed82008 commented 3 years ago

It might make sense to have a Transformation{InSpace, OutSpace} and then have Bijector{0} <: Transformation{0, 0}.

We can also expand the definition of space/domain similar to TransformVariables.jl domains. So we can have a transformation that takes a vector form of the lower triangular part of a symmetric matrix, and returns a proper symmetric matrix.

trappmartin commented 3 years ago

I think that is a great idea.

devmotion commented 3 years ago

I am wondering if the type parameter N can be dropped and the batch computation issue solved in a different way, e.g. by wrapping inputs similar to ColVecs and RowVecs in KernelFunctions. I still think that the restriction that the output is of the same type as the input is strange and limiting (e.g. it is problematic for the improvements of the LKJ bijector mentioned in https://github.com/TuringLang/Bijectors.jl/issues/134#issuecomment-679096131).

A bijective function requires that the mathematical dimension of the in- and outputs are the same but this dimension is not the same as N in AbstractArray{N} and in particular the in- and outputs may not be of the same structure or size. Reshaping is a valid bijective transformation with in- and outputs of different Julia dimension, and e.g. Symmetric or LowerTriangular only parameterize a subset of all matrices of the specified size. Hence I don't think it is good to enforce that the in- and outputs are of the same type or the same size. Additionally, encoding the mathematical dimension does not seem helpful for the batch computations.

it will have almost no composability with other bijectors because compositions require the spaces to be working on the same dimensionality.

Similar to regular Julia functions, one could just drop this requirement and stop trying to reason about outputs/inputs when composing bijectors and throw an error at runtime if they are not compatible.

If one wants to reason about in- and outputs, then I think an encoding of the mathematical spaces of inputs and outputs (e.g. with TransformVariables) would be more useful. But I am not sure, if the mathematical description would be sufficient for the batch computations. E.g., one would have to encode the support of a univariate normal distribution and a one-dimensional multivariate normal distribution in different ways probably. And the compatibility of the mathematical spaces does not per se guarantee that the in- and output types of the bijectors are compatibility when composing them.

torfjelde commented 3 years ago

Let's have the discussion on the referenced issue (https://github.com/TuringLang/Bijectors.jl/issues/58), since it's a more general issue than just this PR.

yebai commented 1 year ago

Please reopen this if needed.