JuliaStats / StatsModels.jl

Specifying, fitting, and evaluating statistical models in Julia
251 stars 31 forks source link

[speculative] What does it look like to use StatsModels with a neural net? #116

Open oxinabox opened 5 years ago

oxinabox commented 5 years ago

This is something we've talked about a bit, Now to be openning an issue to collect those thoughts. At some point this might become a package. We are going to think about using it with Flux, just for the sake of having a concrete example.

What else?

Tokazama commented 5 years ago

So here are some random thoughts I've had. Quality varies so take these ideas as purely speculative at this point (not necessarily strong suggestions).

I think it could open up a lot of opportunities if formulas could be chain terms together in various ways, similar to how Turing.jl allows a series of equations using @model.

So the classic Chain(Conv(...), Conv(...),...) could also be written as a series of equations:

@formula begin
    layer1 ~ Conv(training_images....)
    layer2 ~ Conv(layer1,....)
end

I know this is more verbose, but it also leaves room for several interesting ways of intuitively customizing other aspects of a neural network.

Control over kernel weights

The following syntax is probably not ideal, but with formulas you could precondition weights on certain distributions.

@formula begin
    layer1_kernel_weights ~ Kernel(Normal(), (2,2), channel1 => channel2)
    layer1 ~ Conv(training_images, layer1_kernel_weights)
    layer2 ~ Conv(layer1,...)
end

Treating the kernels as statistical weights and a neural network layer as a traditional statistical model has the possibility of also doing a lot of very simple transfer learning because you could just take another models weights m2 = @formula(newlayer~Conv(training_images, weights(m1))).

Simple topology manipulation

Because formulas allow defining symbols relationships and not just chaining together layers you can easily do a bunch of topological manipulation without new custom layers. So a DenseNet could be something like:

@formula begin
    layer1 ~ training_variables
    layer2 ~ layer1
    layer3 ~ layer1 + layer2
end

This is nice because it doesn't require any novel syntax for implementation and the concatenation aspect is using the same syntax you'd typically use in a formula.

Per layer loss functions and gradient updates

Again this syntax is in no way polished, but being able to easily specify per layer loss functions would be pretty interesting.

@formula(ŷ ~ x| abs(ŷ-mean(y)))

Maybe a grad update could be done with something like Δ to specify the optimiser for back propagation.

@formula(y ~ x Δ ADAM)
oxinabox commented 5 years ago

@Tokazama that is a cool idea, but I think it is outside the scope of StatsModels.jl StatsModels @formula is a DSL for feature engineering. Where as that is a (nifty) DSL for model definition. Like seriously don't stop with this line ouf thought, but I don't think it belongs in this package. (New package: NNModels.jl? EndToEndModels.jl?)

kleinschmidt commented 5 years ago

Those are some very cool ideas. I tend to think of a formula as specifying a single many-to-many transformation, so chaining a bunch of those together to specify a NN topology certainly makes sense to me. I agree with @oxinabox that the specifics seem out-of-scope for statsmodels though; what's NOT out-of-scope is making sure that we don't build in overly-restrictive assumptions about how the abstractions we have here are going to be used in other packages.

kleinschmidt commented 5 years ago

To @oxinabox original questions: the interaction stuff can be handled at the apply_schema stage (using the third argument for the model context), as can setting one-hot encoding as default.

The "always matrix" and obs-dims stuff is more complicated, and I think is part of the general problem of allowing formula consumers control over the destination container (e.g., sparse matrix, GPU array, row/column major).