TuringLang / Bijectors.jl

Implementation of normalising flows and constrained random variable transformations
https://turinglang.org/Bijectors.jl/
MIT License
200 stars 33 forks source link

Support InverseFunctions.jl and ChangesOfVariables.jl #199

Closed oschulz closed 2 years ago

oschulz commented 3 years ago

To do

Context

Bijectors is really neat, but it's also kind of a heavy package with many dependencies (necessarily).

I have some variable transformation stuff in BAT.jl that I want to split out into a separate package, and I've recently been experimenting with some normalizing flow trafos - I'd like to make that kind of stuff compatible with Bijectors, but it would be to heavy as a dependency.

Would you guys (CC @devmotion) be interested in setting up a lightweight variable-trafo-interface package? Maybe just something like

function with_logabsdet_jacobian end

function with_logabsdet_jacobian(f::Base.ComposedFunction)(x)
    y_inner, ladj_inner = with_logabsdet_jacobian(f.inner, x)
    y, ladj_outer = with_logabsdet_jacobian(f.outer, y_inner)
    (y, ladj_inner + ladj_outer)
end

function inverse end

inverse(trafo::Base.ComposedFunction) = Base.ComposedFunction(inverse(f.inner), inverse(f.outer))

inverse(f) = inv(f)  # Until JuliaLang/julia#42421 is decided one way of the other

with_logabsdet_jacobian would be the equivalent of Bijectors.forward, while f(x) would be used to just run the transform without calculating LADJ. So the behavior would be

mytrafo(x) == y

with_logabsdet_jacobian(mytrafo, x) == (y, ladj)

inverse(trafo2∘ trafo1)(x) == inverse(trafo1)(inverse(trafo2)(x))

No autodiff or anything, that we'd leave to implementations like Bijectors.Bijector.

Long-term, it would be nice to get rid of inverse in favor of inv, if we can get inv(::ComposedFunction) into Base (just opened an issue in that regard: JuliaLang/julia#42421).

I'd volunteer to prototype a package (would be quite tiny, basically just the code above) if there's interest.

Edit: Replaced special argument type WithLADJ in proposal with function with_logabsdet_jacobian, as suggested by @devmotion

oschulz commented 3 years ago

I think it would be neat if we had a such a common interface that would allow combining transformations from different packages in workflows that need LADJ and inverse calculation. Could be called "TransformationTraits.jl" or so. It would like ChainRulesCore in spirit, just for LADJ instead of the gradients, even more lightweight, and with the option to define an inverse.

@willtebbutt maybe such a package/interface would be interesting for ParameterHandling.jl as well?

oschulz commented 3 years ago

CC @oxinabox

willtebbutt commented 3 years ago

I'd be very pro- finding a way to make the indirect deps that Bijectors introduces in ParameterHandling much lighter. A number of people have commented on Bijectors being quite a heavy dependency.

I'm not totally sure that a BijectorsCore package would necessarily help ParameterHandling though, because ParameterHandling would continue to need to depend on Bijectors in order to get the actual bijections -- in this sense, ParameterHandling is analogous to an AD in the ChainRules world, rather than a package which defines a couple of rules.

oschulz commented 3 years ago

[...] introduces in ParameterHandling much lighter. A number of people have commented on Bijectors being quite a heavy dependency.

Yes, I recently looked at using ParameterHandling in a package, but the transitive deps would be a bit heavy (otherwise I like the concept of ParameterHandling very much!).

BijectorsCore package would necessarily help ParameterHandling though [...]

Ah right. Still, I hope the idea of a lightweight interface-package for transformations can find some support - after all, we've been very successfull with this approach in the ecosystem. Bikeshedding-wise I wouldn't call it "BijectorsCore", it could also cover non-bijective/invertible mappings (or transformations that are in bijective in principle, but for which the inverse is hard to calculate but also not required).

willtebbutt commented 3 years ago

Ah right. Still, I hope the idea of a lightweight interface-package for transformations can find some support

Yeah -- I'm totally in favour of this. It's the kind of thing that I might even use to define the defaults in ParameterHandling so as to avoid the bijectors dep.

devmotion commented 3 years ago

Most dependencies are quite lightweight actually, the main exemptions are NonlinearSolve and Distributions. I made some improvements and type inference fixes in Roots recently which (this was the plan) would allow us to switch back from NonlinearSolve to the much more lightweight Roots package. Another issue is Requires which eg increases loading times if you actually use the optional dependencies.

I guess a core interface package could be useful even though I am not sure if it would have to depend on Distributions as well (if yes, it would maybe be another motivation for a lightweight DistributionsBase package).

oschulz commented 3 years ago

I am not sure if it would have to depend on Distributions as well

What I had in mind would literally just be the outline above, so really super-lightweight and no dependencies beyond Base.

@devmotion, would you support such a super-lightweight "TransformationTraits" (or similar name) package in Bijectors? We could host it at a central place like JuliaMath or JuliaStats. It would just mean adding something like

TransformationTraits.with_logabsdet_jacobian(b::Bijector, x) = forward(b, x)
TransformationTraits.inverse(b::Bijector) = inv(b)

or maybe even making forward just an alias for with_logabsdet_jacobian (for backward compatibility).

If "TransformationTraits" is lightweight enough, we should be able to convince packages like DiffEqFlux to support it as well, e.g. for the ffjord normalizing flows and similar.

devmotion commented 3 years ago

I guess you should ask @torfjelde, he's the Bijectors god :smile:

I wonder if one should not define a struct WithLADJ (seems a bit strange at first glance) but rather make a function with_logabsdet_jacobian(f, x) part of the interface that is supposed to return a tuple of f(x) and the log absolute determinant of the Jacobian. Generally, I am also a fan of more descriptive names as you can see :smile:

oschulz commented 3 years ago

@torfjelde, sorry for not pulling you in earlier! Would this be Ok with you?

oschulz commented 3 years ago

I wonder if one should not define a struct WithLADJ [...] but rather make a function with_logabsdet_jacobian(f, x)

Yes, I was torn between these options myself. I liked having the WithLADJ() argument because then a function call will stay a function call. But then, we do use op_on_func(f, args...) for all autodiff stuff, and so on.

I think you're right, better use a function than a special argument. I've updated the code above accordingly.

oschulz commented 3 years ago

I think the whole package would literally just be

function with_logabsdet_jacobian end

function with_logabsdet_jacobian(f::Base.ComposedFunction)(x)
    y_inner, ladj_inner = with_logabsdet_jacobian(f.inner, x)
    y, ladj_outer = with_logabsdet_jacobian(f.outer, y_inner)
    (y, ladj_inner + ladj_outer)
end

function inverse end

inverse(trafo::Base.ComposedFunction) = Base.ComposedFunction(inverse(f.inner), inverse(f.outer))

inverse(f) = inv(f)  # Until JuliaLang/julia#42421 is decided one way or the other

Maybe we should add support for with_logabsdet_jacobian(Base.Fix1(broadcast, f), Xs...) to that, it would help keep code that uses broadcasted transformations short and readable.

With the with_logabsdet_jacobian suggested by @devmotion we can be even more generic and support transformations (a, b) = f(x, y, z) as ((a, b), ladj) = with_logabsdet_jacobian(f, x, y, z).

One interesting option might be to also define inverse for elementary unary functions commonly used in variable transformations like exp, log, etc. It would be a very finite list, so it wouldn't make the package too heavy. Moving this into Base with inv could become part of the proposal in JuliaLang/julia#42421 , if supported.

In any case, the change/additions to Bijectors.jl would just be

const forward = TransformationTraits.with_logabsdet_jacobian
TransformationTraits.inverse(b::Bijector) = inv(b)   # Until JuliaLang/julia#42421 is decided one way or the other

or similar.

torfjelde commented 3 years ago

First off, sorry for the delay in response here! Been busy with a first year evaluation for my PhD, so have been very much off the grid for the past week.

I'm very much in favour of making a lightweight interface package:) We had a discussion about whether Bijectors.jl was too heavy or not not long ago, and how it's unfortunate that we have direct dependencies on packages such as Distributions.jl (IMO the transformations should be separate from the interop with Distributions.jl). So personally very happy to support something like this.

oschulz commented 3 years ago

Thanks @torfjelde (hope your Phd eval went smoothly!). Ok, then how about I prototype a package, based on the outline above?

I would include implementations of with_logabsdet_jacobian and inverse for log, log10, log2, log1p, exp, exp2, exp10, expm1, inv, adjoint, transpose, as log and exp are often used to transform between bounded and unbounded variables and inv, adjoint and transpose are the fundamental self-inverse operations (so their implementations will be trivial). Shouldn't increase the measurable load time of the package and can't be done outside of it without committing type-piracy.

Bikeshedding: @torfjelde , @devmotion and @willtebbutt : Do you have any preference regarding package name? I was thinking "TransformationTraits.jl", "TransformationIntercase.jl", "VarTrafoTraits.jl", "VarTrafoInterface.jl" or so.

devmotion commented 3 years ago

Hmm maybe it would be good to use a name that makes it clear which transformations the package deals with (transformation of measures/probability distributions/densities)? I think TransformationTraits etc. is a bit too general. And I think I would prefer something like XXXInterface, XXXCore or XXXBase instead of XXXTraits.

willtebbutt commented 3 years ago

No strong preference on the name :)

oschulz commented 3 years ago

Hmm maybe it would be good to use a name that makes it clear which transformations the package deals with (transformation of measures/probability distributions/densities)

I don't think it has to be limited to transformations of measures/probability distributions/densities - IMHO the API above would cover variable transformations in general, I would also cover integration problems and other cases where one needs the local delta-volume of a trafo, but where the target function may not be limited to positive values or may not even be scalar.

And I think I would prefer something like XXXInterface, XXXCore or XXXBase instead of XXXTraits

How about "VarTrafoInterface.jl" then? Or "VariableTransformationInterface.jl" - but that's awefully long.

devmotion commented 3 years ago

"Variable" is also not very specific I guess :stuck_out_tongue: Maybe just LogAbsDetJacobian.jl if its main purpose is the definition of with_logabsdet_jacobian?

oschulz commented 3 years ago

Hm, "LogAbsDetJacobian.jl" or "LogAbsDetJacobians.jl"? It's a bit unwieldy, though.

However, if we focus completely on with_logabsdet_jacobian in that package, then I think we should create a second package InvertibleFunctions.jl that concentrates on inverse. What do you think?

torfjelde commented 3 years ago

It will also provides inversion for a bunch functions though, so feel like LogAbsDetJacobian is too restrictive. I'm also in agreement that this extends beyond just interactions with distributions/measures.

I'm more so leaning towards TransformTraits.jl, but maybe that is also a bit too broad. There's always InvertibleTransformTraits.jl, but it's a bit of a mouthful + we'd like to have logabsdetjac work for some "not-quite-invertible" functions/only locally invertible, etc. :confused:

In conclusion: TransformTraits or TransformInterface are the ones I like the most atm :+1:

torfjelde commented 3 years ago

However, if we focus completely on with_logabsdet_jacobian in that package, then I think we should create a second package InvertibleFunctions.jl that concentrates on inverse. What do you think?

Had the same thought, but seems a bit redundant given that the only things we care about right now is invertibility and logabsdetjac. IMO start with it under the same package, and then we can separate into different packages later if the scope expands?

oschulz commented 3 years ago

A few more suggestions: ChangesOfVariables, VolumeElements or TransformationVolumeElements

oschulz commented 3 years ago

create a second package InvertibleFunctions.jl that concentrates on inverse

Had the same thought, but seems a bit redundant given that the only things we care about right now is invertibility and logabsdetjac

Yes, I also think it might be better to do both in one package, it will be very lightweight already, and there may be some interdepencencies/links between with_logabsdet_jacobian and inverse implementations.

oschulz commented 3 years ago

In conclusion: TransformTraits or TransformInterface are the ones I like the most atm

I'm very happy with both of those - should it be "transform" or "transformation"? I did a bit of a literature search on this terminology and ended up confused. :-)

devmotion commented 3 years ago

Ah yes, I forgot the definitions of the inverses. I still think that Transform or Transformation are very general names and not self explanatory. I like ChangeOfVariables.jl since the name already indicates the intention and area of the interface.

oschulz commented 3 years ago

ChangeOfVariables.jl

I'd be happy with that - @torfjelde, fine with you?

Oh - should we name it ChangesOfVariables, using the Julia "plural" convention, just in case we ever want a ChangeOfVariables type?

oschulz commented 3 years ago

Any objections to ChangesOfVariables? If not, I'll set up the package. :-)

torfjelde commented 3 years ago

Oh sorry, I forgot to respond to this!

Personally, I'm not a huge fan of ChangeOfVariables :shrug: It's too specific IMO since the package will provide an interface that goes beyond this particular use-case (in particular inversion has way more use-cases). I can also imagine us wanting to add more functionality than just with_logabsdet_jacobian in the future, quickly outgrowing the package name.

But I really just want a package with this interface:) So if the consensus is that ChangeOfVariables is superior, then I'm of course fine with that :+1:

oschulz commented 3 years ago

in particular inversion has way more use-cases

Me too, I need this soon. :-)

Hm, not every change of variables is necessarily strictly bijective/invertible (e.g. when changing from a periodic infinite space to the "unit cell", like in solid-state physics), and not every invertible function has a log-abs-det-jacobian (e.g. invertible discrete functions).

Proposal: I create a package ChangesOfVariables now, and we see how far it'll carry us. Certainly a change of variable is a very central concept, and having a package that directly reflects the concept in it's name should always be useful. If we do see that use cases of inversion outgrow the package name at some point, we can still create a package InvertibleFunctions (or similar) later on and move the inversion stuff there.

devmotion commented 3 years ago

I've seen other use cases of invertible functions/inv definitions for functions in completely different contexts without log absdet (eg https://github.com/JuliaGaussianProcesses/GPLikelihoods.jl/issues/43#issuecomment-920974828), so I think it could be useful to keep both things separate.

oschulz commented 3 years ago

Ok, then I'll make two packages: InverseFunctions and ChangesOfVariables?

Since it would be more narrow in focus then, we could name the second one VolumeElementRules instead of ChangesOfVariables, and rename with_logabsdet_jacobian to ladj_rule. This would be closer to ChainRules.frule, naming-wise. Or we simply name it VolumeElements (keeping with_logabsdet_jacobian as the function name).

torfjelde commented 3 years ago

Ok, then I'll make two packages: InverseFunctions and ChangesOfVariables?

Don't you think this has the potential of resulting in a lot of packages? And since this is just an interface, there's nothing stopping, say, GPLikelihoods.jl depending on the package even though it also includes with_logabsdet_jacobian.

rename with_logabsdet_jacobian to ladj_rule. This would be closer to ChainRules.frule, naming-wise

Though I like the sentiment, I think taking the direction of "rules" is a bit redundant for with_logabsdet_jacobian simply because there's really just one way to do it for compositions: accumulation. I.e. I'm in favour of the with_logabsdet_jacobian naming, personally.

oschulz commented 3 years ago

Don't you think this has the potential of resulting in a lot of packages

Possibly, yes. But you and @devmotion pointed out that use cases for inverse functions may go beyond a "change-of-variables" package. :-)

I'd be happy to create a ChangesOfVariables.jl now and see where it leads us, adding/splitting packages later if a need should arise.

I think taking the direction of "rules" is a bit redundant for with_logabsdet_jacobian simply because there's really just one way to do it for compositions: accumulation

Fair point.

torfjelde commented 3 years ago

I'd be happy to create a ChangesOfVariables.jl now and see where it leads us, adding/splitting packages later if a need should arise.

Same:) Probably best to just get something going at this point :+1:

oschulz commented 3 years ago

@devmotion Ok with you?

devmotion commented 3 years ago

I'm quite certain that we will want to split the package at some point but I am fine with starting with a single package.

(A bit more elaborate explanation: IMO two packages will improve discoverability since not everyone interested in invertible functions wants to work with or even thinks about logabsdet etc.. Moreover, the invertible functions package would be more lightweight - even if it's just a few additional functions in a joint package it means more work for the compiler and longer compilation and loading times.)

oschulz commented 3 years ago

@torfjelde , would you be fine with starting with ChangesOfVariables and InverseFunctions right away? @devmotion does make a strong argument here, I think, and in the end it will be less work if we see foresee a need for separate packages already.

torfjelde commented 3 years ago

would you be fine with starting with ChangesOfVariables and InverseFunctions right away?

Sure :+1:

even if it's just a few additional functions in a joint package it means more work for the compiler and longer compilation and loading times

Won't this be neglible?

oschulz commented 3 years ago

Ok, I'll upload a draft as soon as possible.

oschulz commented 3 years ago

Here we go:

@devmotion , @torfjelde , are you Ok with this design?

Should we register them first, or move them into JuliaMath or JuliaStats first (I guess JuliaMath would be a better fit)?

devmotion commented 3 years ago

Great! I think it does not matter. In fact, currently I'm a bit annoyed that eg IrrationalConstants lives in JuliaMath but nobody seems to feel responsible for it and for reviewing and merging PRs. So someone from the org has to be willing to maintain them.

oschulz commented 3 years ago

Maybe JuliaStats then, since none of us is a JuliaMath member, currently (actually, could you invite me into JuliaStats)?

oschulz commented 3 years ago

Should we register them first, or move them [...] I think it does not matter.

In that case, I'll push register on the two packages, so we get the three-day wait behind us. Moving is quick, after all.

Afterwards, I volunteer to prep a PR on LogExpFunctions.jl, adding inverse and with_logabsdet_jacobian support to functions like logistic should definitely be useful, right?

devmotion commented 3 years ago

I'd like to see inverse (https://github.com/JuliaGaussianProcesses/GPLikelihoods.jl/pull/48 :stuck_out_tongue: ) and I assume it will be uncontroversial (I'll approve it at least :slightly_smiling_face:). I'd be in favour of with_logabsdet_jacobian as well but I wonder if some other maintainers have a different opinion there.

oschulz commented 3 years ago

In that case, I'll push register on the two packages

Running:

https://github.com/JuliaRegistries/General/pull/46390 https://github.com/JuliaRegistries/General/pull/46391

oschulz commented 2 years ago

Ok, now that we have https://github.com/JuliaMath/InverseFunctions.jl and https://github.com/JuliaMath/ChangesOfVariables.jl, we just need to use them in Bijectors. :-)

torfjelde commented 2 years ago

Lovely! Thank you for making these.

I've been AWOL for a couple of weeks now (sorry about that), but back now and fixing up Bijectors.jl is high on my priority list so will get to this very soon:)

oschulz commented 2 years ago

@torfjelde , I could prepare a small non-breaking PR to add add initial support for the InverseFunctions.jl and ChangesOfVariables.jl API to Bijector as it is now (similar to what I did for TransformVariables in tpapp/TransformVariables.jl#85), until you more substantial changes.

torfjelde commented 2 years ago

@torfjelde , I could prepare a small non-breaking PR to add add initial support for the InverseFunctions.jl and ChangesOfVariables.jl API to Bijector as it is now (similar to what I did for TransformVariables in tpapp/TransformVariables.jl#85), until you more substantial changes.

That would be awesome and greatly appreciated!:) Feel free to shoot if you have any questions.

I'm still somewhat unhappy with how we're going to deal with batches of inputs, i.e. how do I efficiently compute with_logabsdet_jacobian(f, batch) where f is some composition without creating a bunch of intermediate Vector{Tuple{T, Float64}} that I then need to convert into Vector{T} and Vector{Float64} before applying the next function in the composition.

oschulz commented 2 years ago

That would be awesome and greatly appreciated!:) Feel free to shoot if you have any questions.

Will do!

I'm still somewhat unhappy with how we're going to deal with batches of inputs [...]

Yes, I've been fighting with that myself lately - I've been playing with StructArrays but haven't been fully satisfied with the results, and then it has to be AD-compatible. I found a way to make ArraysOfArrays.jl fully Zygote-compatible now, will to this in the next days. But efficient handling of composed and then broadcasted univariate transformations (esp. combined with parameter optimization for normalizing flows) seems a tricky business.

torfjelde commented 2 years ago

Yes, I've been fighting with that myself lately - I've been playing with StructArrays but haven't been fully satisfied with the results, and then it has to be AD-compatible. I found a way to make ArraysOfArrays.jl fully Zygote-compatible now, will to this in the next days. But efficient handling of composed and then broadcasted univariate transformations (esp. combined with parameter optimization for normalizing flows) seems a tricky business.

This is something discussed quite a bit before: https://github.com/TuringLang/Bijectors.jl/discussions/178.

Personally I'm leaning towards generalizing ColVecs and RowVecs from KernelFunctions.jl to a more general Batch. It's similar to the idea of ArraysOfArrays.jl. This way we can provide default implementations which just does map while simultaneously allowing each function to specialize on the underlying storage for higher performance, if the implementer so desires.

ArraysOfArrays.jl could potentially provide the same, but I think there are a couple of caveats:

  1. It could be more light-weight (and it uses Requires.jl).
  2. It's objective is providing an easy way to view N + M dimensional arrays as M dimensional, but for a Batch you're really just interested in converting anything into a 1-dimensional collection.