FluxML / Optimisers.jl

Optimisers.jl defines many standard optimisers and utilities for learning loops.
https://fluxml.ai/Optimisers.jl
MIT License
72 stars 20 forks source link

some minor style changes #135

Open CarloLucibello opened 1 year ago

CarloLucibello commented 1 year ago

The most relevant is using state in place of tree in the interface.jl file, consistently with rules.jl and the docs.

mcabbott commented 1 year ago

consistently with rules.jl

I don't think that's right. At present, state is what's produced by init and stored in one Leaf in a field of that name. Whereas tree is what's produced by setup for a whole model, and contains many Leafs. It's a different concept.

Now that we allow shared parameters, the word "tree" is slightly misleading. But I don't think that erasing the distinction and calling two different things "state" is better. IMO even an imperfect name is helpful to understand what the code is doing.

and the docs

The docs also say "tree" here, and this PR does not touch that. In Flux they say opt_state instead. IDK how jarring that is, we could consider changing it here, but I'm not sure we want opt_state.

CarloLucibello commented 1 year ago

The Flux usage example in this package's documentation shows

state = Optimisers.setup(rule, model);  # initialise this optimiser's momentum etc.

Can't we consider a leaf also a tree/state ? (a graph with a single node is a tree)

My main issue is that each time I look at this repo I wonder for a while what is this argument tree, it's not immediate to link it to the optimizer's state. Also opt_state would be better than tree.

darsnack commented 1 year ago

Maybe state_tree and state_leaf are the right balance? I feel like having the tree mental model is important for understanding how this repo works.

ToucheSir commented 1 year ago

Optax uses opt_state in their docs and state internally in their rules. They have it easier though because rules are "vectorized" and thus no function is really dealing with Leaf like we have.

mcabbott commented 1 year ago

Flux usage example in this package's documentation

This may well pre-date the re-write -- at some point everything was called state, and there was no Leaf, there was less need for a distinction. Using state_tree here would be fine. Using it in docstrings we could think about. Using a long snake-case name in code I think we should not do.

darsnack commented 1 year ago

We discussed this on call. There was agreement that the structure is no longer a tree, so calling it a tree is confusing. But after comparing to other libraries like Optax, we think that users/rule writers should have a rule-specific view of the package. Meaning that they see state everywhere (for what init returns + for what update returns). The fact that update handles complicated data structure is the "magic" of the package. Calling the return a tree, etc. is implementation leaking into the documentation.

Internally, we think that non-leaf state should be opt_state and leaf state should be state.