Closed jfsantos closed 9 years ago
Thanks for contributing it! DBNs are a logical continuation of RBMs, but I never had time to implement them. Few details that I was thinking of:
fit()
calls on them). Probably the easiest way to achieve this will be to pass initialized layers similar to Pipeline
from SciKit Learn. Regarding 1, I think it is definitely important. We can write an improved constructor and fit
functions to do this. The fit
function for DBNs could take a list of arguments to be passed to each layer's fit
call, for example.
I am trying to contribute to Mocha as well, and was thinking about adding replaceable backends to your RBM implementations. Basically, the computing-intensive functions from layers get a Backend instance as an argument, and dispatch depending on the type of this argument (e.g., you will have forward(b::CPUBackend, layer, X)
and forward(b::GPUBackend, layer, X)
). We could do pretty much the same thing for RBMs.
It would be interesting to add some integration to Mocha, even though their "philosophies" are a bit different (which makes sense, as training algorithms for RBMs and belief networks are a bit different from those used for feed forward nets). We could start automating the process of performing unsupervised training of a DBN and then converting it to an MLP for supervised fine-tuning. This is exactly what I am doing now for my project, so I'll see if I can come up with a draft implementation.
@jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory.
Sure, I think that is the way to go, as MNIST.jl already includes the data and does not require manually running a script as Mocha.
On Dec 16, 2014, at 5:30 PM, Andrei Zhabinski notifications@github.com wrote:
@jfsantos https://github.com/jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory.
— Reply to this email directly or view it on GitHub https://github.com/dfdx/Boltzmann.jl/pull/3#issuecomment-67244971.
Hi, I'm the author of Mocha. I agree that some integration of the two packages will be really nice for the community. For example, the immediate thing I could think of is to use Boltzmann.jl to initialize weights for DNN that get fine-tuned in Mocha.jl. I think this should be relatively straightforward if you export the trained weights to HDF5 file and ask Mocha to load that weights as initialization. Mocha is already using this kind of mechanism to load models trained by Caffe. The HDF5 file Mocha reads has a simple format: see here: http://mochajl.readthedocs.org/en/latest/user-guide/tools/import-caffe-model.html#mocha-s-hdf5-snapshot-format
Of course, we could discuss about the data format if needed. :)
@pluskid I believe HDF5 will work fine. I'll have a long weekend starting from Thursday to spend on learning Mocha (finally) and try to implement this kind of exporting. Meanwhile, is there an example of converting Julia arrays to Mocha-compatible 4D tensor?
@dfdx Starting with the last version (v0.0.5) Mocha actually support ND-tensor. And an ND-tensor (Blob) is essentially a (shallow wrapper of a) Julia array Array{Float64, N}
(or Float32
). So if you are talking about: you have a julia array, and want to save to an HDF5 file so that Mocha can read, then there is no conversion need. Except that Mocha only support either Float32 or Float64 because BLAS only support those.
For example, the weight blob of a InnerProduct layer is a 2D-tensor (matrix) of the shape 'P-by-Q', where P is input dimension, and Q is the target dimension. So essentially rand(Float64, (P,Q))
could possibly be a valid initialization for the weights parameters.
If you are interested, there is a bit document about Blob (ND-tensors) in Mocha: http://mochajl.readthedocs.org/en/latest/dev-guide/blob.html
I've added export to Mocha as a part of DBN redesign
This is a cleaned up version of my DBN implementation using the RBMs in Boltzmann.jl. For now it is a really simple extension, as it justs adds a new type
DBN
and a function to fit it, as well as a helper function to compute the mean of the hiddens at a given layer. The user can only change the type of the first RBM because in most of the applications I've seen all the upper layers are Bernoulli RBMs, but this can be easily changed if needed.I also added an example that uses the MNIST dataset to train it. The HDF5 file is generated by this script from the Mocha.jl package. I think we could test it with a simpler dataset and add that test at a separate folder (e.g.,
examples/
).