JuliaML / StochasticOptimization.jl

Implementations of stochastic optimization algorithms and solvers
Other
30 stars 10 forks source link

Save/load model/state and Specs #9

Open tbreloff opened 8 years ago

tbreloff commented 8 years ago

Frequently it's not enough to have an in-memory representation of a model or the optimization algorithm state. We want to serialize the structure, parameters, and state of the models and sub-learners that we're working with.

I'd like to take this one step further, and define a generic "spec" for these stateful items that we can both serialize/deserialize, but which could be loaded into another "backend" easily. I'm using the same terminology as Plots on purpose... I think there are a lot of similarities in the problems we're looking to solve.

Some examples of backends:

The idea is that, where there is overlapping functionality, there is the opportunity to generalize. Suppose we have a general concept "I want a 3 layer neural net with these numbers of nodes and relu activations, and these initial weight values". Lots of software implements this. If we build this information into a generic spec (similar to the Plot object in Plots), then we only need to connect the spec to a backend's constructor, and we have the ability to convert and transfer models between backends.

The same goes for optimization routines... many times there is a 1-1 mapping between, for example, an Adam optimizer in TensorFlow and Theano. But each backend reimplements the concept in a different way, and reinvents the wheel completely. The Plots model applies here as well. Define a generic concept Adam updater, then map from that to a backend's implementation.

The end result is that I'd like for models and learning algorithms to be built from specs, which then get mapped to sub-learners specific to a particular spec. This would allow us to serialize/deserialize a backend-agnostic spec, not actual Julia objects.

For example, it would allow us to experiment with structures/models/algos in something more flexible (JuliaML I hope), and then convert to a TensorFlow graph for pounding the GPU or sharing with other stubborn researchers that aren't using Julia.

This is closely related to https://github.com/tbreloff/Plots.jl/issues/390, and I think design decisions can be used for both.