Documenting Design Patterns

MikeInnes commented 4 years ago

A lot of what Flux can do is not explicitly written down. Regularisation is a good example; just grabbing your parameters and summing them is really simple and intuitive, but if you're used to frameworks that provide an explicit API for this, you might get not think of it, or assume it's not supported at all if it isn't in the docs.

So we need to document Flux "design patterns" that explicitly cover features from other frameworks. Some things off the top of my head:

Regularisation
Debugging implicit gradient results (Grads)
Logging and debugging the forward pass
Logging and debugging the backwards pass, + clipping, dropping grads etc.
Advanced Gradients (link Zygote docs)
Custom initialisation of layers
Generally-useful parts of #251
Enabling/disabling GPU compute across a script
Flux.stop() and Flux.skip() (#821)
RNN usage (#808)
Bidirectional RNNs
Flux for TF/PyTorch user guides
Gradient clipping, e.g..

Ideas (or requests) for other features that we should document how to do are welcome.

DhairyaLGandhi commented 4 years ago

Writing custom adjoints and intuition behind it. Maybe even a section on arbitrary code semantics (find, handling workers etc)

MikeInnes commented 4 years ago

Agreed, we definitely need to replace the backprop section that used to cover tracker. We don't want to duplicate all the Zygote docs, but some explanation in the context of Flux would be really helpful.

jumerckx commented 4 years ago

I'd love some explanation on mutation in a model using Buffer.

scheidan commented 4 years ago

An example of gradient clipping would be good too.

MikeInnes commented 4 years ago

@scheidan I think we should add some clipping layers (#672) but yes, giving people the know-how to do it themselves more generally is also a good idea of course.

@merckxiaan Is there anything Flux-specific we should say about Buffer? It's probably worth at least mentioning, a long with a few other more advanced tricks that are documented by Zygote.

jumerckx commented 4 years ago

I haven't got anything specific in mind but it'd be interesting to see a simple deep learning network that uses Buffers. I'm sure Buffers are really helpful to make efficient models but I still don't really get how things can be achieved when mathematical operations are not permitted on them?

MikeInnes commented 4 years ago

Buffers are really just meant as a workaround when you want to do array construction using mutation. So you might use them inside the definition of a basic array op like cat, but they'd be completely transient; you wouldn't pass Buffers through a deep learning model.

janEbert commented 4 years ago

Advanced Gradients (link Zygote docs)

Adding to that, not tracking parameters should be included there (or somewhere else) as well. You mentioned Zygote.dropgrads.

MikeInnes commented 4 years ago

Yes, that definitely also fits under "things we should have APIs for" too; added to the list.

appleparan commented 4 years ago

How about writing tutorial how to port tutorial codes from Tensorflow or Pytorch docs? Many other ML codes are written based on TF or torch and sometimes I was very confusing about how to convert them. For me, input set structure is most unfamiliar things compared to other frameworks. Without model-zoo, it is hard to use Flux.jl by reading its docs only. I think comparing other framework codes with Flux by 1vs1 would be helpful.

MikeInnes commented 4 years ago

Good idea. I think we could add "Flux for TF users" or "Flux for PyTorch users" guides as sections in the docs. Happy to help anyone who wants to contribute that.

RohitMazumder commented 4 years ago

If no one is already working on it, then I would love to contribute on "Flux for TF users" !

MikeInnes commented 4 years ago

That would be great!

cossio commented 4 years ago

Automatic parameter extraction for custom types, via @functor.

E.g., add this example from @dhairyagandhi96 to the docs, which shows how you can control what fields are added to the parameter tree:

julia> struct MyLayer{T,K}
         a::T
         b::K
       end
julia> using Flux: @functor
julia> @functor MyLayer (a,)
julia> _l = MyLayer(rand(3,3), rand(5))
MyLayer{Array{Float64,2},Array{Float64,1}}([0.8382666790752258 0.16279958328830713 0.8185255499278947; 0.10188918358486099 0.7421499443512403 0.7912103198124705; 0.7105677086316595 0.16360615883658625 0.5766784867701418], [0.7118888831680539, 0.3682507168932143, 0.18493277328287605, 0.025627816691143, 0.6084281600097385])
julia> Flux.params(_l)
Params([[0.8382666790752258 0.16279958328830713 0.8185255499278947; 0.10188918358486099 0.7421499443512403 0.7912103198124705; 0.7105677086316595 0.16360615883658625 0.5766784867701418]])
julia> size.(Flux.params(_l))
1-element Array{Tuple{Int64,Int64},1}:
 (3, 3)

DhairyaLGandhi commented 4 years ago

Adding transfer learning examples would be good too

DhairyaLGandhi commented 4 years ago

Data loading + viz and TensorboardLogger.jl integration

ToucheSir commented 3 years ago

Thoughts on breaking this into smaller issues and creating a project board for them? I know some (e.g. RNNs, ecosystem) have been addressed already.

FluxML / Flux.jl

Documenting Design Patterns #891