dfdx / Boltzmann.jl

Restricted Boltzmann Machines in Julia
Other
67 stars 27 forks source link

Implementation of Deep Belief Networks #3

Closed jfsantos closed 9 years ago

jfsantos commented 9 years ago

This is a cleaned up version of my DBN implementation using the RBMs in Boltzmann.jl. For now it is a really simple extension, as it justs adds a new type DBN and a function to fit it, as well as a helper function to compute the mean of the hiddens at a given layer. The user can only change the type of the first RBM because in most of the applications I've seen all the upper layers are Bernoulli RBMs, but this can be easily changed if needed.

I also added an example that uses the MNIST dataset to train it. The HDF5 file is generated by this script from the Mocha.jl package. I think we could test it with a simpler dataset and add that test at a separate folder (e.g., examples/).

dfdx commented 9 years ago

Thanks for contributing it! DBNs are a logical continuation of RBMs, but I never had time to implement them. Few details that I was thinking of:

  1. We should be able to pass parameters to RBM constructors (and possibly individual fit() calls on them). Probably the easiest way to achieve this will be to pass initialized layers similar to Pipeline from SciKit Learn.
  2. Though this package is not intended to be superseded or replaced by Mocha.jl (e.g. I have successfully used pure RBM on sparse data for recommendation engine, which Mocha is really not designed for), some integration with it is really welcome. I especially like their replaceable backends, which simplify writing code for CPU and GPU a lot. On other hand, as far as I know, they still miss belief networks, and we can fix it. Right now I'm more busy with some classification algorithm packages, but taking a closer look at Mocha is definitely on my TODO list.
jfsantos commented 9 years ago

Regarding 1, I think it is definitely important. We can write an improved constructor and fit functions to do this. The fit function for DBNs could take a list of arguments to be passed to each layer's fit call, for example.

I am trying to contribute to Mocha as well, and was thinking about adding replaceable backends to your RBM implementations. Basically, the computing-intensive functions from layers get a Backend instance as an argument, and dispatch depending on the type of this argument (e.g., you will have forward(b::CPUBackend, layer, X) and forward(b::GPUBackend, layer, X)). We could do pretty much the same thing for RBMs.

It would be interesting to add some integration to Mocha, even though their "philosophies" are a bit different (which makes sense, as training algorithms for RBMs and belief networks are a bit different from those used for feed forward nets). We could start automating the process of performing unsupervised training of a DBN and then converting it to an MLP for supervised fine-tuning. This is exactly what I am doing now for my project, so I'll see if I can come up with a draft implementation.

dfdx commented 9 years ago

@jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory.

jfsantos commented 9 years ago

Sure, I think that is the way to go, as MNIST.jl already includes the data and does not require manually running a script as Mocha.

On Dec 16, 2014, at 5:30 PM, Andrei Zhabinski notifications@github.com wrote:

@jfsantos https://github.com/jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory.

— Reply to this email directly or view it on GitHub https://github.com/dfdx/Boltzmann.jl/pull/3#issuecomment-67244971.

pluskid commented 9 years ago

Hi, I'm the author of Mocha. I agree that some integration of the two packages will be really nice for the community. For example, the immediate thing I could think of is to use Boltzmann.jl to initialize weights for DNN that get fine-tuned in Mocha.jl. I think this should be relatively straightforward if you export the trained weights to HDF5 file and ask Mocha to load that weights as initialization. Mocha is already using this kind of mechanism to load models trained by Caffe. The HDF5 file Mocha reads has a simple format: see here: http://mochajl.readthedocs.org/en/latest/user-guide/tools/import-caffe-model.html#mocha-s-hdf5-snapshot-format

Of course, we could discuss about the data format if needed. :)

dfdx commented 9 years ago

@pluskid I believe HDF5 will work fine. I'll have a long weekend starting from Thursday to spend on learning Mocha (finally) and try to implement this kind of exporting. Meanwhile, is there an example of converting Julia arrays to Mocha-compatible 4D tensor?

pluskid commented 9 years ago

@dfdx Starting with the last version (v0.0.5) Mocha actually support ND-tensor. And an ND-tensor (Blob) is essentially a (shallow wrapper of a) Julia array Array{Float64, N} (or Float32). So if you are talking about: you have a julia array, and want to save to an HDF5 file so that Mocha can read, then there is no conversion need. Except that Mocha only support either Float32 or Float64 because BLAS only support those.

For example, the weight blob of a InnerProduct layer is a 2D-tensor (matrix) of the shape 'P-by-Q', where P is input dimension, and Q is the target dimension. So essentially rand(Float64, (P,Q)) could possibly be a valid initialization for the weights parameters.

If you are interested, there is a bit document about Blob (ND-tensors) in Mocha: http://mochajl.readthedocs.org/en/latest/dev-guide/blob.html

dfdx commented 9 years ago

I've added export to Mocha as a part of DBN redesign