Open oxinabox opened 5 years ago
So here are some random thoughts I've had. Quality varies so take these ideas as purely speculative at this point (not necessarily strong suggestions).
I think it could open up a lot of opportunities if formulas could be chain terms together in various ways, similar to how Turing.jl allows a series of equations using @model
.
So the classic Chain(Conv(...), Conv(...),...)
could also be written as a series of equations:
@formula begin
layer1 ~ Conv(training_images....)
layer2 ~ Conv(layer1,....)
end
I know this is more verbose, but it also leaves room for several interesting ways of intuitively customizing other aspects of a neural network.
The following syntax is probably not ideal, but with formulas you could precondition weights on certain distributions.
@formula begin
layer1_kernel_weights ~ Kernel(Normal(), (2,2), channel1 => channel2)
layer1 ~ Conv(training_images, layer1_kernel_weights)
layer2 ~ Conv(layer1,...)
end
Treating the kernels as statistical weights and a neural network layer as a traditional statistical model has the possibility of also doing a lot of very simple transfer learning because you could just take another models weights m2 = @formula(newlayer~Conv(training_images, weights(m1)))
.
Because formulas allow defining symbols relationships and not just chaining together layers you can easily do a bunch of topological manipulation without new custom layers. So a DenseNet could be something like:
@formula begin
layer1 ~ training_variables
layer2 ~ layer1
layer3 ~ layer1 + layer2
end
This is nice because it doesn't require any novel syntax for implementation and the concatenation aspect is using the same syntax you'd typically use in a formula.
Again this syntax is in no way polished, but being able to easily specify per layer loss functions would be pretty interesting.
@formula(ŷ ~ x| abs(ŷ-mean(y)))
Maybe a grad update could be done with something like Δ
to specify the optimiser for back propagation.
@formula(y ~ x Δ ADAM)
@Tokazama that is a cool idea, but I think it is outside the scope of StatsModels.jl
StatsModels @formula
is a DSL for feature engineering.
Where as that is a (nifty) DSL for model definition.
Like seriously don't stop with this line ouf thought, but I don't think it belongs in this package.
(New package: NNModels.jl? EndToEndModels.jl?)
Those are some very cool ideas. I tend to think of a formula as specifying a single many-to-many transformation, so chaining a bunch of those together to specify a NN topology certainly makes sense to me. I agree with @oxinabox that the specifics seem out-of-scope for statsmodels though; what's NOT out-of-scope is making sure that we don't build in overly-restrictive assumptions about how the abstractions we have here are going to be used in other packages.
To @oxinabox original questions: the interaction stuff can be handled at the apply_schema
stage (using the third argument for the model context), as can setting one-hot encoding as default.
The "always matrix" and obs-dims stuff is more complicated, and I think is part of the general problem of allowing formula consumers control over the destination container (e.g., sparse matrix, GPU array, row/column major).
This is something we've talked about a bit, Now to be openning an issue to collect those thoughts. At some point this might become a package. We are going to think about using it with Flux, just for the sake of having a concrete example.
a&b
anda+b
anda*b
should all be treated the same, as all terms can interact in NNsFlux.onehot
vectors, which transform them into indexing operations whey they are multiplied with the weight matricies.What else?