TuringLang / Bijectors.jl

Implementation of normalising flows and constrained random variable transformations
https://turinglang.org/Bijectors.jl/
MIT License
200 stars 33 forks source link

TransformVariables-like example? #197

Open cscherrer opened 3 years ago

cscherrer commented 3 years ago

I'm looking into transitioning MeasureTheory to use Bijectors instead of TransformVariables, but there are some differences between the packages that are really throwing me. First, some questions I had asked on Slack:

  1. Most other implementations of this sort of thing default to going from unconstrained to constrained space, which is how HMC usually works. Why does Bijectors go the other way?
  2. The N type parameter says it's "dimensionality", but then I can't make sense of things like
    
    (b::Exp{0})(y::AbstractArray{<:Real}) = exp.(y)
    (b::Log{0})(x::AbstractArray{<:Real}) = log.(x)

(b::Exp{1})(y::AbstractVector{<:Real}) = exp.(y) (b::Exp{1})(y::AbstractMatrix{<:Real}) = exp.(y) (b::Log{1})(x::AbstractVector{<:Real}) = log.(x) (b::Log{1})(x::AbstractMatrix{<:Real}) = log.(x)

(b::Exp{2})(y::AbstractMatrix{<:Real}) = exp.(y) (b::Log{2})(x::AbstractMatrix{<:Real}) = log.(x)

Can someone give some more detail on this?

3. For HMC, each "raw sample" is a vector. So we need to traverse this and stretch the space to fit model constraints, and also reshape as we go. But the above code doesn't have any reshaping. Maybe this is implemented somewhere else?

Maybe what would help the most is a most detailed example. Here's something I can currently do pretty easily in MeasureTheory, using TransformVariables. What would this look like with Bijectors?
```julia
julia> using MeasureTheory

julia> using FillArrays

julia> import TransformVariables

julia> const TV = TransformVariables
TransformVariables

julia> d1 = Dirichlet(Fill(0.5,3))
Dirichlet(α = Fill(0.5, 3),)

julia> d2 = LKJCholesky(4)
LKJCholesky(k = 4, η = 1.0)

julia> t = TV.as((a = as(d1), b = as(d2)))
TransformVariables.TransformTuple{NamedTuple{(:a, :b), Tuple{TransformVariables.UnitSimplex, CorrCholesky}}}((a = TransformVariables.UnitSimplex(3), b = CorrCholesky(4)), 8)

julia> TV.dimension(t)
8

julia> x = randn(TV.dimension(t))
8-element Vector{Float64}:
  1.5407190522204608
 -0.6056420769935066
 -0.7596924734411458
 -1.1307368364302133
 -1.014749550491414
  0.17945582914255376
 -0.1076769860481117
 -1.6382848658474718

julia> s = TV.transform(t, x);

julia> s.a
3-element Vector{Float64}:
 0.7000575392137569
 0.10589586720053043
 0.1940465935857127

julia> s.b
LinearAlgebra.Cholesky{Float64, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}
U factor:
4×4 LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}:
 1.0  -0.362574  -0.51195    0.0894879
  ⋅    0.931955  -0.401931  -0.0535707
  ⋅     ⋅         0.759183  -0.670924
  ⋅     ⋅          ⋅         0.734155
torfjelde commented 3 years ago
# t = TV.as((a = as(d1), b = as(d2)))
t = NamedBijector((a = bijector(d1), b = bijector(d2)))
# x = randn(TV.dimension(t))
x = (a = rand(transformed(d1)), b = rand(transformed(d2)))

We're not assuming Vector inputs/fixed size, so this doesn't make sense in Bijectors.jl. The bijectors/transformations are usually not tied to a particular distribution instance (there are some that require the sizes to make sense, e.g. Stacked which has to define ranges on which each bijector will act). Plan is to add Reshape and Vec bijectors in #183 to allow going from Matrix to Vector, etc. if that is desired.

The rest is the same. But there are redundancies in Bijectors.jl, e.g. Dirichlet has on element that is not used because we're compatible with the one from Distributions.jl, the CorrBijector (bijector for LKJ) considers a triangular matrix which then as additional degrees of freedom that are redundant. The ones acting on Vector would have to be implemented (though it shouldn't be too hard to adapt the existing ones to produce an output on the project space instead, and then none-vector implementations could just reshape). But this requires us to drop the dimension-in-bijector approach, which is part of #183.

EDIT: I'm actually a bit uncertain how the CorrBijector works tbh (I didn't implement it), but there def seems like there is some redundancy though I'm not entirely certain what/where.

cscherrer commented 3 years ago

Thanks @torfjelde . For performance, I think it's important for the transformation to work in terms of a local variable, like an iteration. For example, here's TransformVariables on a Simplex (https://github.com/tpapp/TransformVariables.jl/blob/master/src/special_arrays.jl#L100):

function transform_with(flag::LogJacFlag, t::UnitSimplex, x::AbstractVector, index)
    @unpack n = t
    T = extended_eltype(x)

    ℓ = logjac_zero(flag, T)
    stick = one(T)
    y = Vector{T}(undef, n)
    @inbounds for i in 1:n-1
        xi = x[index]
        index += 1
        z = logistic(xi - log(n-i))
        y[i] = z * stick

        if !(flag isa NoLogJac)
            ℓ += log(stick) - logit_logjac(z)
        end

        stick *= 1 - z
    end

    y[end] = stick

    y, ℓ, index
end

In this way, index is maintained locally, so it's very easy to sequence transformations.

If you have a composition of bijectors ending with a Reshape, will you be able to do something like this, and also avoid multiple writes into memory?

But there are redundancies in Bijectors.jl, e.g. Dirichlet has on element that is not used because we're compatible with the one from Distributions.jl

I think this is an orthogonal concern, TransformVariables works great with Distributions. For example,

julia> t = as(Dirichlet(Fill(0.3,3)))
TransformVariables.UnitSimplex(3)

julia> x = TV.transform(t, randn(2))
3-element Vector{Float64}:
 0.18136540387191707
 0.08052913397901988
 0.7381054621490631

julia> logdensity(Dists.Dirichlet(Fill(0.3,3)), x)
-0.04998534448868064

In general, a lot of this is about manifold embeddings. If I'm embedding k dimensions into n, I want it expressed in this way, and not as if I had n to begin with.

EDIT: It's partly a "want", but maybe more a need. It seems very confusing and error-prone to me otherwise.

torfjelde commented 3 years ago

If you have a composition of bijectors ending with a Reshape, will you be able to do something like this, and also avoid multiple writes into memory?

Yeah because we could just make vector-versions of the ones that really work in a lower-dim space, and then just make the current implementations those composed with a reshape or whatever.

I think this is an orthogonal concern, TransformVariables works great with Distributions.

Sorry, compatiblility was the wrong word: I meant "consistency". I didn't implement these btw :sweat_smile: All I know is that we did it because Distributions.jl includes the last element.

In general, a lot of this is about manifold embeddings. If I'm embedding k dimensions into n, I want it expressed in this way, and not as if I had n to begin with.

Again, 100% agree and it's unfortunate that it's not the way we're doing it atm. Hence #183:)

cscherrer commented 3 years ago

Great! And I think we had already talked about having bijectors write into preallocated memory, so the transformation can be allocation-free.

Oh, and one more thing... I think all of the dimensions will usually be known statically, in which case it makes sense to have some generated functions unrolling the loops, using LoopVectorization etc. I can add those if you're not planning it already, I'm doing things like that in MultrivariateMeasures anyway and I feel like I'm starting to get the hang of it :)

So... I guess I should wait for #183?

torfjelde commented 3 years ago

Yeah, waiting for #183 is probably a good idea:) Hopefully soonTM!

cscherrer commented 3 years ago

Sounds good :)