drckf / paysage

Unsupervised learning and generative models in python/pytorch.
Other
119 stars 25 forks source link

Shift logic from models to layers to improve modularity #24

Closed drckf closed 7 years ago

drckf commented 7 years ago

In the current version, layers are just thin wrappers around a few functions that provide access to random sampling routines and a few other things like the mean or partition function of a distribution. All of the parameters are located in the model class and all of the logic is handled by the model class.

We can write a fairly general form that encompasses different types of Boltzmann machines as:

E(v, h) = -sum_i a_i(v_i) - sum_j b_j(hj) - \sum{ij} W_{ij} v_i h_j

where a_i(.) and b_j(.) are functions rather than just parameters. These functions should be defined by the layers. Also, the weights are defined by how the layers are stacked together. In this way, we could define a Bernoulli-Bernoulli RBM like:

BernoulliRBM = Model( [BernoulliLayer(n_visible), BernoulliLayer(n_hidden)] )

And a Gaussian RBM like:

GaussianRBM = Model( [GaussianLayer(n_visible), BernoulliLayer(n_hidden)] )

Or stack things into multiple layers like:

DBM = Model( [BernoulliLayer(nvis), BernoulliLayer(nhid_1), BernoulliLayer(nhid_2)] )

This will require quite a bit of thought and work, but it should be a high priority.

AdrianLsk commented 7 years ago

If we look at the model as a graph of nodes (visible, hidden layers with corresponding offset/bias function) connected by edges (the weights), it would be reasonable to implement it in such way.

So each layer would implement (in addition to current implementation) a bias function plus corresponding parameters and function for energy contribution. The model would implement the connection between layers, i.e. hold weights and aggregation function which would collect the energy contributions and sample messages (latent h's) between layers in various directions as proposed in joint training of dbms. I think it's good to keep the multi-layer model structure in mind right from the beginning, as it will easily generalize to the single layer model.

What do you think?