FluxML / Optimisers.jl

Optimisers.jl defines many standard optimisers and utilities for learning loops.
https://fluxml.ai/Optimisers.jl
MIT License
72 stars 20 forks source link

write `>>` as infix notation for `OptimiserChain` #139

Open mcabbott opened 1 year ago

mcabbott commented 1 year ago

This lets you chain optimisation rules like setup(ClipGrad(1.0) => WeightDecay() => Descent(0.1), model). It's just notation & printing, there is no functional change to OptimiserChain.

Edit: now uses >> instead of =>.

CarloLucibello commented 1 year ago

What about using |> instead?

mcabbott commented 1 year ago

That's possible. But x |> f normally returns a value... the operation seems more like f ∘ g in that it returns another object like f. In fact Chain is almost exactly , just written left-to-right.

CarloLucibello commented 1 year ago

f |> g would be the natural candidate for the reversed composition operator. Also, making r1 => r2 not return a Pair is technically breaking, while defining

(r2::AbstractRule)(r1::AbstractRule) = OptimiserChain(r1, r2)

would only break piratical (and unlikely) similar definitions.

ToucheSir commented 1 year ago

Since chaining rules is so close to f ∘ g as mentioned, can we just overload that operator? Unlike => which is just an alias for Pair and |> which has well-defined (and different) semantics, we are basically composing rules here.

mcabbott commented 1 year ago

One problem with is that the order is backwards. OptimiserChain matches Flux.Chain in putting the first operation leftmost.

We could literally use Flux.Chain by moving the struct Chain here, to be shorter. But maybe this is confusing. Rules are a lot like functions which take dx and return modified dx, but not exactly, whereas layers really are.

ToucheSir commented 1 year ago

I personally don't think asking users to write the order of rules "backwards" is a big deal, but that might just be me. Suggesting OptimiserChain be aliased to something short (e.g. Optax has chain) could be a docs-only solution. If we're ok with unicode, then I don't think \rightarrow and co are bound to anything. Ideally we could find an operator like * which parses to op(a, b, c, ...) so that only one call to OptimiserChain is required.

mcabbott commented 1 year ago

The only other vararg infix operator is ++. Although it isn't a big deal to digest things like :(1 => 2 => 3), indeed :(sin ∘ cos ∘ tan) is binary but prints as if it's not.

ToucheSir commented 1 year ago

Yes, finding a good operator is the higher priority.

CarloLucibello commented 1 year ago

I still think |> makes the most sense, it means composition left to right although it is "sourced" by the initial argument in the chain. As an alternative, also \rightarrow (can be typed also \to) is fine.

darsnack commented 1 year ago

We discussed this during the call. In general, all the options outlined here don't seem worth it. There is very little value-add, and we are either breaking/bending existing semantics or unnecessarily using up an infix operator (the latter being poor form given that Julia is not permissive about these).

There are already workable solutions for users: use the new import ... as ... syntax or use const ... = OptimiserChain (the latter works for infix operators like \rightarrow mentioned above).

Here's some additional context for the current naming: https://github.com/FluxML/Optimisers.jl/pull/9#discussion_r566559542 (which also goes into stuff discussed here like function composition being a bit confusing).

One option we settled on is to have an un-exported verb: Optimisers.chain. This will just make the OptimiserChain for you, but it has the benefit of not clashing with Flux.Chain if someone wants to explicitly import it. Given the generality of the name, we will not export it by default.

mcabbott commented 12 months ago

Building in two distinct names to allow someone to write using Optimisers: chain and afterwards save a few chars doesn't seem great.

The nice thing about infix syntax like WeightDecay() => Descent(0.1) is that it also saves one level of annoying nested brackets. In both typing and printing.

breaking

I am very dubious that one case of Pair(AbstractRule, AbstractRule) exists in the wild. Nevertheless we just missed a golden opportunity to roll this into the 0.3 change.

don't think asking users to write the order of rules "backwards" is a big deal

I do think this is quite confusing. (In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)

unnecessarily using up an infix operator

This objection is to defining a currently undefined operator, like ++ or \rightarrow. It's awkward when two packages use the same one.

It does not apply to abusing other built-in infix operators. We could use say >>? This has the advantage that, unlike |> and , it does not have a confusing nearby pre-existing meaning.

ToucheSir commented 12 months ago

(In fact I remain a little confused why AdamW seems to be backwards, but that's another topic.)

That's https://github.com/FluxML/Optimisers.jl/pull/46#discussion_r795262521 and https://github.com/FluxML/Flux.jl/pull/1612. PyTorch inexplicably chooses to do their own thing, and despite reading https://github.com/pytorch/pytorch/pull/21250#issuecomment-520559993 multiple times I still don't understand why.