Closed kloudkl closed 10 years ago
This is an excellent suggestion. Now that Caffe has learned to experiment with DAGs, weight sharing is a natural next generalization that we have been discussing in lab. Making a public issue like this will help focus the plan.
Thanks, especially for the clarity of this proposal and the references.
Thanks for your support!
March 7 has been mentioned several times. What is the blocking factor until then?
The ECCV '14 paper submission deadline. We are a research group, after all, and there's only so much :coffee:
We've decided on an alternative design for weight sharing. A new param
field will be added to layers that define their parameter blobs, and Net::Init()
will have a preprocessing step that instantiates shared parameter blobs and shares them among layers as needed. Layers that do not share parameters will hold their blobs internally, as they do now.
Closing as #500 will solve.
Multiple factors motivate this proposal including #57, #119, #129, the papers and codes of Generative Stochastic Networks (GSN) [1, 2, 3, 4, 5], and the scene labeling paper using Recurrent Convolutional Neural Network (RCNN) [6].
In the new design Blob becomes BlobNode and has source nodes and target nodes. The edges are represented by the LayerEdge which no longer containS bottom or top blobs. The nodes are independent of the processing layers. Both nodes and edges can be reused (data or weight parameter shared) according to the structures of the network which generally will not be linearly arranged layers from the bottom straightly to the top but truly networks like the social graphs.
[1] Yoshua Bengio, Éric Thibodeau-Laufer, Jason Yosinski. Deep Generative Stochastic Networks Trainable by Backprop. arXiv:1306.1091 [cs.LG]. 2013. [2] Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent. Generalized Denoising Auto-Encoders as Generative Models. NIPS, 2013. [3] Li Yao. Efficient implementation of Generative Stochastic Networks. https://github.com/yaoli/GSN. 2013. [4] @lightcatcher. Generative Stochastic Network (GSN) model. https://github.com/lisa-lab/pylearn2/pull/392. 2013. [5] Jian Zhou, Olga Troyanskaya. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction. JMLR W&CP 32 (1) : 745–753, 2014. [6] Pedro Pinheiro, Ronan Collobert. Recurrent Convolutional Neural Networks for Scene Labeling. JMLR W&CP 32 (1) : 82–90, 2014.