Closed MilesCranmer closed 1 year ago
This is because:
julia> fieldnames(ResidualDense)
(:properties,)
julia> propertynames(ResidualDense)
(:var, :body)
and Functors isn't careful: it checks fieldnames but calls getproperty:
https://github.com/FluxML/Functors.jl/blob/v0.3.0/src/functor.jl#L11-L16
One fix is:
Another fix would be for ProtoStructs to make getproperty
work for the extra (internal) name. It could have a weird, possibly generated, name to avoid clashes.
Nice! Thanks for figuring this out.
Aside: I wonder if something like @proto
should be used by default for Flux.jl custom layers? It seems like such a common use-case to modify structs when you are developing neural nets. I suppose it wouldn't hurt performance either.
Here's a terrible idea I had, just to see what if it spurs any useful ideas for others:
@layer struct ResBlock
w1::Dense
act::Function
forward = (self, x, y) -> begin
self.act(self.w1(x)))
end
end
where @layer
would basically do both what @proto
and @functor
(as well as @kwdef
, for the forward
) currently do. The forward
property of ResBlock
here would let you embed the forward function actually inside the struct declaration. I guess it's kind of against Julia style, but it seems intuitive for quickly building deep neural nets.
So, from this, the macro would generate the following code:
struct ResBlock{NT<:NamedTuple}
properties::NT
end
function getproperty
...
end
# etc., etc.
function (self::ResBlock)(x, y)
self.forward(self, x, y)
end
(Maybe I am thinking of trying this out: https://github.com/Suzhou-Tongyuan/ObjectOriented.jl - it's definitely useful for DL-type models)
I don't know if we want that as default @layer
, because eventually you want to remove the @proto
specification (this is explicitly stated even in the ProtoStructs.jl readme).
Supporting ProtoStruct layers is good, because sometimes writing your own layer is unavoidable. But Flux has some features that make using different types (classes) for each sub-layer not the norm. For example, Metalhead.jl, the vision model library, can build almost all types of residual networks without defining a ResBlock
type. This is because of two things in Flux that most frameworks lack:
Parallel
layer (and special case SkipConnection
) which encompasses most branching network architecturesChain
or Parallel
and access sub-layers via those names (so that you don't just see an ocean of Parallel
)(Btw this is not to shoot anything down; if what we have doesn't really do what you want, then we want to know. Love the enthusiasm!)
To build on Kyle's point: Flux's philosophy towards layer types is very much "come as you are". We'd like to make it easier to define/register existing structs as layers and to do so without depending on Flux itself. So while we should absolutely try to support libraries like ProtoStruct and ObjectOriented.jl if we can, we also want to keep the barrier of entry for working with the module system as low as possible (if you know how to define a callable struct, you can make a layer).
Thanks for all the comments on my admittedly half-thought-out ideas! Did not know about Parallel
- nice!
I don't know if we want that as default
@layer
, because eventually you want to remove the@proto
specification (this is explicitly stated even in the ProtoStructs.jl readme).
Good point, agreed!
I wonder if you would want to mention Revise.jl+ProtoStructs.jl on the custom layer page (or even @reexport
some of the key functionality, so you could do Flux.revise()
at the top of the REPL to turn it on), since it seems almost required when developing neural nets; otherwise the startup time really hurts when working on complex networks.
For a lot of users, my sense is that Flux.jl may be the very first Julia package they try out.
We also want to keep the barrier of entry for working with the module system as low as possible (if you know how to define a callable struct, you can make a layer).
The compatibility makes a lot of sense. I am just trying to think if there's any way to simplify the current custom layer declaration method for new users. Right now you need to call struct MyStruct
, function (m::MyStruct)(...)
, and @functor
, and then also construct the components of the layer separately. This feels unwieldy.
Not only is it four separate calls, it's four different types of calls. I need to create (1) a stuct, then (2) a method, (3) call a macro, then separately (4) construct the layer's components. And I need to do that for every single custom layer. In Haiku and PyTorch, you would only create a (1) class and (2) method β sometimes you can even create a custom layer with just a method. Using four different ideas to make a simple NN layer just seems a bit heavy, and the self methods in Julia (like function (m::MyType)(x)
) are admittedly ugly, and seem more appropriate for use by package developers rather than end users. That functionality always seemed more suitable for designing convenience methods in a package, rather than working on user objects.
Even something like
struct MyLayer
x::Dense
y::Dense
end
@layermethod (m::MyLayer)(z) -> relu(m.x(z) + m.y(z))
might make for a slightly cleaner convenience API. Though the standard methods would of course also be available.
Thoughts? I am aware I could simply be too comfortable with the Python DL ecosystem, though, and this isn't Julia-esque enough. No worries if that is the case.
I think my dream layer creation method would really be something like
@layerfactory function my_layer(n_in, n_out)
w1 = Dense(n_in, 128)
w2 = Dense(128, n_out)
return (x) -> begin
y = relu(w1(x)) + x
w2(y)
end
end
which would let you construct the layer struct, the layer's components, and the forward pass, all in one go. (But that would obviously be tricky to implement).
Best, Miles
create (1) a stuct, then (2) a method, (3) call a macro, then separately (4) construct the layer's components.
What's nice about 1,2,4 is that there is nothing Flux-specific about them. They are completely ordinary Julia code.
Making a special DSL isn't impossible, but it's one more thing you have to learn, and it will have limitations. This is a little bit like the train!
story, where saving a few lines (compared to writing out the ordinary Julia code of the for
loop) comes at the cost of weird API to memorise, and limitations so that you later have to know the other way too.
Most things for which it's worth defining a new layer will want something extra. If I'm reading correctly the example here is an easy combination of existing layers:
Chain(SkipConnection(Dense(n_in => 128, relu), +), Dense(128 => nout))
Some of us would like to remove @functor
and make it just recurse by default. Then perhaps @layer
would be an optional way to add fancy printing (or to customise which bits are trainable) but not needed for the simplest layers.
I agree we should mention Revise.jl & maybe ProtoStructs.jl. On the ecosystem page for sure. Maybe ProtoStructs.jl ought to be on the advanced layer building page too? (Which could use some cleaning up.)
Thanks, I see your point. I just wonder if there's a way to still use ordinary Julia code but require less manual labor to build custom layers, at the expense of being slightly less generic. Recursing by default would definitely be an improvement!
If I'm reading correctly the example here is an easy combination of existing layers:
This was just a MWE; in practice there would probably not be an existing layer for my applications.
In some ways the layers like Chain/SkipConnection/Dense
already forms a DSL. I guess what I am looking for a simpler way to extend this DSL to new layers so I can quickly prototype models for my research. With certain assumptions about what would go into the layer (e.g., similar assumptions to Torch/Flax methods), I think a custom layer factory could be written in a condensed form. Currently the way to make custom layers is 100% generic, and as a result is a bit more code - maybe there could also be a partially-generic method.
Here's one idea for a layer factory:
struct LayerFactory{F<:Function,NT<:NamedTuple}
forward::F
layers::NT
end
LayerFactory(f; layers...) = LayerFactory(f, NamedTuple(layers))
function (f::LayerFactory)(args...)
return f.forward(f.layers, args...)
end
@functor LayerFactory
This makes it super easy to construct custom layers. Watch this:
my_layer = LayerFactory(; w1=Dense(5, 128), act=relu) do self, x
self.act(self.w1(x))
end
That's literally all you need! I can construct custom layers in one line, without even changing the underlying structure of Flux.jl. And it works for training and everything.
What do you think about this?
This should work fine. What's a nice example of a nontrivial use?
The use-case I had in mind is graph networks, where you have a set of (nodes, edges, globals) that you need to do scatter operations on - it seems tricky to get that working with a Chain
, but it should be doable to set it up with custom layers.
I am really happy about this LayerFactory
struct. I'm actually surprised that it just works with this package. You can even change the number of fields in the same runtime, and it still works! Would you be willing to include it in Flux.jl as a simple way to construct custom layers?
e.g., here's another example:
model = LayerFactory(;
w1=Dense(1, 128), w2=Dense(128, 128), w3=Dense(128, 1), act=relu
) do self, x
x = self.act(self.w1(x))
x = self.act(self.w2(x))
self.w3(x)
end
p = params(model) # works!
Here's a simple implementation of a graph network in PyTorch: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.meta.MetaLayer
It even works for compositions of LayerFactory!
function MLP(n_in, n_out, nlayers)
LayerFactory(;
w1=Dense(n_in, 128), w2=[Dense(128, 128) for i=1:nlayers], w3=Dense(128, n_out), act=relu
) do self, x
embed = self.act(self.w1(x))
for w in self.w2
embed = self.act(w(embed))
end
self.w3(embed)
end
end
model = LayerFactory(; mlp1=MLP(1, 128, 2), mlp2=MLP(128, 1, 3)) do self, x
self.mlp2(self.mlp1(x))
end
I am not super familiar with GNNs, but you might want to check out GraphNeuralNetworks.jl to see how they handle working with Flux. They do seem to have a custom GNNChain
.
Okay, I think I understand what you are saying. If you have a sufficiently complex forward function that involves sub-layers, then writing it from scratch with "base" Julia + Flux is a bunch of text. As would be the case with "base" PyTorch or Jax, but those libraries have utilities built on top like your LayerFactory
. So is it right that you are looking for the same in Flux?
While I have no problem with LayerFactory
since it is a nice convenience utility, I want to note that if we decide to auto-@functor
structs, then LayerFactory
comes for free from plain Julia:
mlp(n_in, n_out, nlayers) = let w1 = Dense(n_in, 128), w2 = [Dense(128, 128) for i in 1:nlayers], w3 = Dense(128, n_out)
return function(x)
act = relu
embed = act(w1(x))
for w in w2
embed = act(w(embed))
end
w3(embed)
end
end
model = let mlp1 = mlp(1, 128, 2), mlp2 = mlp(128, 1, 3)
x -> mlp2(mlp1(x))
end
p = params(model) # works too!
Below is just for your reference.
Looking at the link you shared, this is what I would write in Flux:
# this is one way avoiding structs completely
EdgeModel(edge_mlp = Chain(...)) = Chain(
(src, dest, edge_attr, u, batch) -> vcat(src, dest, edge_attr, u[batch]),
edge_mlp
)
# admittedly, structs seems nice here
Base.@kwdef struct NodeModel{T, S}
node_mlp_1::T = Chain(...)
node_mlp_2::S = Chain(...)
end
@functor NodeModel
function (m::NodeModel)((x, edge_index, edge_attr, u, batch))
row, col = edge_index
out = vcat(x[row], edge_attr)
out = m.node_mlp_1(out)
# not sure what this is doing but we have a NNlib.scatter
out = scatter_mean(out, col, ...)
out = vcat(x, out, u[batch])
return m.node_mlp_2(out)
end
And so on. Modulo the @functor
issue, I don't see how defining a class and forward
function is shorter than what's above. Seem like just an extra end
keyword and a separation between the struct definition and the forward definition. The models I defined above are ready to be passed into a Chain
or Parallel
(assuming that's what MetaLayer
is).
Or another way of putting it: Chain
and friends are a sort-of a DSL for passing arguments between sub-layers. You have a need to define your own argument passing container layer, but you don't want to write all the extra stuff that goes along with a base layer like Conv
. The Haiku link you shared shows a mechanism for constructing "layers" that have no type, but they do have fields, a forward pass, and a variable they are bound to. These three things in Julia are exactly what make an anonymous function! The only thing preventing this from just working^(TM) in Flux is that @functor
is opt-in.
While I have no problem with LayerFactory since it is a nice convenience utility, I want to note that if we decide to auto-
@functor
structs, then LayerFactory comes for free from plain Julia:
I don't understand how your example works. In your example, model
is a function, rather than an object; so it wouldn't remember its parameters - they would be initialized each time. Whereas LayerFactory
is actually an object. Unless you meant to actually declare the w1
outside of the function, and declare them as globals?
Oops brain fart on my part, but see the correction using a let
block
Okay but note that the anonymous function returned by the let
blocks is not "a function rather than an object" in Julia because anonymous functions are implemented under the hood as structs. The variables they close over are the fields of the struct. In essence, the let
+ closure in Julia is a Base implementation of your LayerFactory
.
LayerFactory
need not require a custom type either:
LayerFactory(f; layers...) = Base.Fix1(f, NamedTuple(layers))
We don't currently @functor Base.Fix1
in Functors, but that's only because we haven't gotten around to it. Given the triviality (and generality outside of ML) of this function, I don't think it has to live inside of Flux. Much like we do with Split
, we could add a "cookbook" entry on how to define your own version of this in the docs.
I see, thanks. The let
syntax is a new to me! Would be great when that approach actually works.
Seem like just an extra
end
keyword and a separation between the struct definition and the forward definition.
Don't forget the model instantiation! Compare these two:
model = LayerFactory(; w1=Dense(5, 128), w2=Dense(128, 1), act=relu) do self, x
x = self.act(self.w1(x))
self.w2(x)
end
(or the let
method!) versus:
struct MyLayer
w1::Dense
w2::Dense
act::Function
end
@functor MyLayer
function (self::MyLayer)(x)
x = self.act(self.w1(x))
self.w2(x)
end
model = MyLayer(Dense(5, 128), Dense(128, 1), relu)
The latter example would discourage me from using it. Note also that the second example will break if I use Revise.jl and change the inputs, whereas let
and LayerFactory
will just work.
Given the triviality (and generality outside of ML) of this function
Up to you but I don't see a problem with including this in the code alongside Chain
... I would consider these to be core pieces of functionality for any DL framework to quickly compose custom models - much more so than Split
which seems niche. Making the user implement these core pieces of functionality themselves is just another barrier to ease-of-use.
Btw, in the let
example, can you access subcomponents of a model, like w1
? Or are all the pieces hidden inside the closure? If not I think I might prefer having a NamedTuple with @functor
pre-declared.
The LayerFactory thing seems cute. Maybe see how it goes for building some models in real life and figure out what the unexpected warts are?
One refinement which could be added is a macro which would put the self
in for you, and perhaps arrange to print it some human-readable way.
can you access subcomponents of a model, like w1?
Yes. With Kyle's code:
julia> mlp(1,2,3)
#12 (generic function with 1 method)
julia> ans.w3.bias
2-element Vector{Float32}:
0.0
0.0
I would consider these to be core pieces of functionality for any DL framework to quickly compose custom models
Does this mean Python frameworks don't even meet the bar then? :P
My impression is that Flux is already offering more layer helpers like Parallel
or Maxout
than most other frameworks (c.f. PyTorch/TF/Flax, which would make you define a custom class for both). We also want to avoid scenarios where someone blows up their code's (or worse, package's) import time by +12s unnecessarily because they decided to depend on Flux for a one-liner function.
The LayerFactory thing seems cute. Maybe see how it goes for building some models in real life and figure out what the unexpected warts are?
Will do! π (when I get a chance...)
Does this mean Python frameworks don't even meet the bar then? :P
Not quite... Say what you will about Python, but the DL frameworks are very polished. Here's how you would do a zero layer MLP in
Haiku:
@hk.transform
def forward(x):
w1 = hk.Linear(100)
w2 = hk.Linear(10)
return mlp2(jax.nn.relu(mlp1(x)))
params = forward.init(rng, x)
PyTorch:
class Net(nn.Module):
def __init__(self):
super().__init__()
self.w1 = nn.Linear(10, 100)
self.w2 = nn.Linear(100, 1)
def forward(self, x):
return self.w2(F.relu(self.w1(x)))
model = Net()
My impression is that Flux is already offering more layer helpers like Parallel or Maxout
PyTorch actively discourages users from using nn.Sequential
for complex operations (equivalent of Chain
), since it isn't obvious what's actually going on. i.e., Sequential
operations should be sequential. Users are encouraged to write their own forward
function (equivalent of a custom layer) for anything more than super basic sequential patterns. I don't think it's a bad idea myself... I may choose to use vcat
explicitly in a forward pass than use a parallel block just because I'm more used to that pattern.
Don't forget the model instantiation! ...
I agree, the factory is much shorter even keeping aside the instantiation. But the factory isn't the default way to make layers in other frameworks either.
From your most recent example (in Julia):
Base.@kwdef struct Net{T, S}
w1::T = nn.Linear(10, 100)
w2::S = nn.Linear(100, 1)
end
(self::Net)(x) = self.w2(relu(self.w1(x)))
@functor Net # boo we also don't like this
model = Net()
What I was trying to figure out is if you wanted the default mechanism to change or a utility built on top. But I think we settled this Q! We're talking about a convenience method here.
PyTorch actively discourages users from using nn.Sequential for complex operations (equivalent of Chain), since it isn't obvious what's actually going on.
The problem here is that if I make a new model = Net()
, I have no clue what Net
does without reading the content of its forward
. Even something called ResBlock
could be non-obvious without reading the code for certain variants. In contrast, printing out a model built purely of Chain
, Parallel
, etc. has understandable execution from just the model architecture being printed in the REPL (you do need to know enough Flux to have seen these layers before though). You also get contextual information with named sub-layers in our containers. All of this is important to us, because code reuse is extremely common in Julia. We want people to feel confident instantiating unknown models in the REPL and using them without needing to go to the source definition. We don't like how people keep redefining code in Python frameworks.
This being said, I like the declarative nature and syntactic clarity of what you are proposing. I think the broader point here is that:
Chain
, Parallel
, etc. fall short)So, like Michael, I would be happy to include it...after some thought and maybe seeing if it has unforeseen limitations.
Kyle beat me to it, but just to add this:
PyTorch actively discourages users from using
nn.Sequential
for complex operations (equivalent ofChain
), since it isn't obvious what's actually going on. i.e.,Sequential
operations should be sequential. Users are encouraged to write their ownforward
function (equivalent of a custom layer) for anything more than super basic sequential patterns.
This is a good idea in some circumstances and a bad one in others. For example, torchvision models are full of Sequential
s, because users want to be able to slice up and otherwise manipulate those models.
So, like Michael, I would be happy to include it...after some thought and maybe seeing if it has unforeseen limitations.
Sounds good!
I have no clue what Net does without reading the content of its forward. Even something called ResBlock could be non-obvious without reading the code for certain variants. In contrast, printing out a model built purely of Chain, Parallel, etc. has understandable execution from just the model architecture being printed in the REPL (you do need to know enough Flux to have seen these layers before though).
I'm not sure I completely understand. Is your goal to make all types of neural network models possible with Chain
by itself? With the complexity of modern neural nets, this seems like it requires building an impossibly massive DSL. Why not just rely the core Julia language through user-created custom layers, with small Chain
blocks for common modules like MLPs and CNNs (similar to what other DL frameworks do)? In any case, I don't think it's humanly possible to understand the internals of a modern NN by reading a sequence of modules - you ultimately have to go through the forward
function when it's not a sequential stack of modules.
But maybe user could always refactor their model into separate LayerFactory
with helpful names for each (@mcabbott's suggestion for adding a show
method), and maybe that could help with interpretation.
We don't like how people keep redefining code in Python frameworks.
You seem to be bringing up Julia v Python... I want to be clear I am really not trying to go there (I'm on the Julia side, for the record; I've just had experience with both!). I'm purely talking about the syntax itself.
If you consider PyTorch by itself as a software and ecosystem, there is an obscene amount of code re-use. I can take someone's custom torch model with an extremely complex forward
pass, and put it inside my custom model, and it works seamlessly. Again, this is just PyTorch -> Pytorch (the same DSL!); I'm not talking about the existing incompatibility between different Python frameworks! But my point is that I don't think there's intrinsic problems with users creating custom layers and sharing them. It's just like sharing any other code. If it's the expected input/output types, and well-documented, it should be okay.
Flux is community maintained with a very distributed process, making a concise, manageable codebase valuable (more code = more maintenance burden)
Making it easier to construct custom layers seems precisely aligned with your goals, no? Then users can go build these layers themselves, rather than you having to worry about building a massive library of modules. And you only need to maintain the most commonly-re-used layers.
@ToucheSir For example, torchvision models are full of Sequentials, because users want to be able to slice up and otherwise manipulate those models.
Right - you might use Sequential
for common sequential pieces (like a stack of convolutions or an MLP), and then write a forward function to bring them all together in a complex way. The default printing would print each sequential piece inside a module, and perhaps a user could overload the printing to print each Sequential
in a hierarchical way next to particular model parameters.
Why not just rely the core Julia language through user-created custom layers, with small
Chain
blocks for common modules like MLPs and CNNs (similar to what other DL frameworks do)? In any case, I don't think it's humanly possible to understand the internals of a modern NN by reading a sequence of modules - you ultimately have to go through theforward
function when it's not a sequential stack of modules.
I think we're all on the same page here, just that the devil is in the details :slightly_smiling_face:. Looking at the original PR which ended up spawning Parallel
, one can see that Flux did converge on something like this philosophy: encourage types for non-trivial composite layers in general, but also provide slightly more powerful building blocks for the most common use cases. I'd be remiss to not note that all participants on that thread were/are also active Python ML library users, so there is certainly some diversity of opinion here!
This ties into the code reuse discussion. What I think Kyle is trying to get at is that while a framework shouldn't try to create a DSL for every possible use case, it should try to provide affordances so that users aren't unncessarily having to roll their own code for trivial features. I can't count how many times I've seen research PyTorch code which defines a number of layer types just so that they can have a skip connection. You can tell because those layers are often awkwardly named—they kind of have to be because they really only represent some intermediate chunk of a larger model which wouldn't otherwise be considered standalone (second hardest problem in computer science, etc).
Right - you might use
Sequential
for common sequential pieces (like a stack of convolutions or an MLP), and then write a forward function to bring them all together in a complex way. The default printing would print each sequential piece inside a module, and perhaps a user could overload the printing to print eachSequential
in a hierarchical way next to particular model parameters.
Again, I think we are of roughly the same mind about this. There's a reason, Metalhead.jl, torchvision, Timm, etc. use this pattern. It's also one reason we're hesitant to loudly advertise layer building functionality which always returns a (semi-)anonymous type: you lose that important semantic information from the layer name that you get by using a named class in Python or struct in Julia.
Let me start by saying I don't fundamentally disagree with the feature proposal. I'm just trying to shine light on the design decisions we made in FluxML. Hopefully, this is useful and not unwanted.
You seem to be bringing up Julia v Python... I want to be clear I am really not trying to go there
We have a slight miscommunication, which is my fault for using "Python" as a catch-all when I really meant "X where X is one of TF/Jax/PyTorch" (i.e. considering each framework independently). I certainly wasn't referring to NumPy/Torch/SciPy/etc...I also don't want to go there, and it seems irrelevant to our discussion. In fact, for what we're discussing (syntax for building complex models), the host language (Julia or Python) seems irrelevant.
The point of bringing up Python-based frameworks at all is because I agree with you---they are great DL frameworks. There's a lot of learn from, and so we can make useful comparisons to understand what we do wrong/right.
If you consider PyTorch by itself as a software and ecosystem, there is an obscene amount of code re-use. I can take someone's custom torch model with an extremely complex forward pass, and put it inside my custom model, and it works seamlessly. Again, this is just PyTorch -> Pytorch (the same DSL!)
This isn't exactly the type of re-use I am referring to, and I don't think the various options we are discussing would limit this kind of re-use.
Let's take a concrete example from torchvision
. For ResNet, they define the residual blocks, then in Inception they define the inception modules. In Metalhead (the equivalent Flux library), both ResNet and Inception just use Parallel
. If you look at the papers ([1] Fig 2. and [2] Fig. 4, 5, ...), these custom layers look remarkably similar in structure. They differ in terms what's along each branch or how many branches there are, but the overall "container layer" still does the same operation. Defining this forward pass operation over and over seems like poor code re-use.
Now, PyTorch folks could absolutely have written torchvision
in a way so that this residual branch type layer has a single forward definition that gets re-used...but then they would end up writing Parallel
.
Is your goal to make all types of neural network models possible with Chain by itself? With the complexity of modern neural nets, this seems like it requires building an impossibly massive DSL.
Definitely not all types of models, for two reasons: (a) it's not possible, and (b) even if it were, it would make writing some models unnecessarily cumbersome.
But I will say you can get really far without writing a massive DSL. Layers fall into two categories:
Conv
, Dense
, Upsample
, etc.Chain
, Parallel
, etc.(1) is unavoidable in every framework unless you take an explicitly functional view and make users pass in the weights, state, etc. (2) is where the possible DSL size explosion could happen. But if you take a feedforward NN, then there is a limited set of structures you can see in the DAG---namely Chain
and Parallel
. Metalhead.jl is written in this way, and it covers vision models from AlexNet to ViTs. It does have custom layers not in Flux, but those are mostly (1) layers.
I don't think it's humanly possible to understand the internals of a modern NN by reading a sequence of modules - you ultimately have to go through the forward function when it's not a sequential stack of modules.
I don't know...GoogLeNet's diagram is a pretty complex network but I think you can understand the flow of arguments just by looking at the figure. Even something like CLIP.
Of course, DL isn't restricted to FF DAGs, nor should it be. And I get the feeling these are the kinds of models you work with. So then you need to define a custom (2). We absolutely want users to go ahead and do this whenever they feel like they should. Or maybe even for a simple CNN, you subjectively prefer to write out the forward pass. Go for it! If you do get to the point of writing a custom (2), then your layer factory makes the syntax really short. This is why I like it, and I am in favor of adding it.
Sometimes it is better to "just write the forward pass," and sometimes it is better to use existing layers + builders to create a complex model. Both are "first class" in Flux. I don't want to leave you with the impression that we want everyone to build everything using only Chain
and Parallel
...that would be crazy π¬
[1]: ResNet https://arxiv.org/pdf/1512.03385v1.pdf [2]: Inception https://arxiv.org/pdf/1512.00567v3.pdf
Oops as I was writing and editing my saga, Brian beat me to it by 40 minutes, but my browser didn't refresh :(.
Here is macro version, which should let you write Dense
as shown, and will print it out again like that.
"""
@Magic(forward::Function; construct...)
Creates a layer by specifying some code to construct the layer, run immediately,
and (usually as a `do` block) a function for the forward pass.
You may think of `construct` as keywords, or better as a `let` block creating local variables.
Their names may be used within the body of the `forward` function.
r = @Magic(w = rand(3)) do x w .* x end r([1,1,1]) r([10,10,10]) # same random numbers
d = @Magic(in=5, out=7, W=randn(out,in), b=zeros(out), act=relu) do x y = W * x act.(y .+ b) end d(ones(5, 10)) # 7Γ10 Matrix
"""
macro Magic(fex, kwexs...)
# check input
Meta.isexpr(fex, :(->)) || error("expects a do block")
isempty(kwexs) && error("expects keyword arguments")
all(ex -> Meta.isexpr(ex, :kw), kwexs) || error("expects only keyword argumens")
# make strings
layer = "@Magic"
setup = join(map(ex -> string(ex.args[1], " = ", ex.args[2]), kwexs), ", ")
input = join(fex.args[1].args, ", ")
block = string(Base.remove_linenums!(fex).args[2])
# edit expressions
vars = map(ex -> ex.args[1], kwexs)
assigns = map(ex -> Expr(:(=), ex.args...), kwexs)
@gensym self
pushfirst!(fex.args[1].args, self)
addprefix!(fex, self, vars)
# assemble
quote
let
$(assigns...)
$MagicLayer($fex, ($layer, $setup, $input, $block); $(vars...))
end
end |> esc
end
function addprefix!(ex::Expr, self, vars)
for i in 1:length(ex.args)
if ex.args[i] in vars
ex.args[i] = :($self.$(ex.args[i]))
else
addprefix!(ex.args[i], self, vars)
end
end
end
addprefix!(not_ex, self, vars) = nothing
struct MagicLayer{F,NT<:NamedTuple}
fun::F
strings::NTuple{4,String}
variables::NT
end
MagicLayer(f::Function, str::Tuple; kw...) = MagicLayer(f, str, NamedTuple(kw))
(m::MagicLayer)(x...) = m.fun(m.variables, x...)
MagicLayer(args...) = error("MagicLayer is meant to be constructed by the macro")
Flux.@functor MagicLayer
function Base.show(io::IO, m::MagicLayer)
layer, setup, input, block = m.strings
print(io, layer, "(", setup, ") do ", input)
print(io, block[6:end])
end
Thanks for sharing these answers, I completely agree and I think we are all on the same page! π
Here is macro version, which should let you write Dense as shown, and will print it out again like that.
This is AWESOME, nice job!! I am ππ (two thumbs up) for the support of this feature as a convenient custom layer constructor.
No doubt that has all sorts of bugs! But fun to write. Once you make a macro, it need not be tied to the LayerFactory keyword notation like this, of course.
And whether this is likely to create pretty code or monstrosities, I don't know yet.
Could this be added to Flux.jl, with βVery experimental.β stated in bold in the docstring? I can create a PR and add a couple tests and maybe a paragraph to the docs.
Let me know if I could add it and I can make a PR? Would love to have a feature like this. The @Magic
macro instantly makes Flux.jl the most elegant framework in my view.
We've created an Fluxperimental.jl package for this purpose. Once it is set up and made public, we can ping you for a PR there (which would be appreciated!).
Cool, sounds good to me!
Ok, https://github.com/FluxML/Fluxperimental.jl is live
Closing in favor of https://github.com/FluxML/Fluxperimental.jl/discussions/2 for layer factory and https://github.com/FluxML/Functors.jl/issues/46 for ProtoStruct.jl issue.
For people finding this issue, the discussion above has now resulted in the PR here: https://github.com/FluxML/Fluxperimental.jl/pull/4
There's this really nice package ProtoStruct.jl that lets you create structs which can be revised. I think this is extremely useful for developing custom models in Flux.jl using Revise.jl, since otherwise I would need to restart every time I want to add a new property in my model.
Essentially the way it works is to transform:
into (regardless of the properties)
and, inside the macro, set up constructors based on your current defined properties.
However, right now it doesn't work with Flux.jl. When I try to get the parameters from a model, I see the error:
NamedTuple has no field properties
. Here is a MWE:and here is the error:
How hard would it be to make this compatible? I think it would be extremely useful to be able to quickly revise model definitions!
(Sorry for the spam today, by the way)