BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.15k stars 18.67k forks source link

Generalize the network into graph of Blob nodes and Layer edges #166

Closed kloudkl closed 10 years ago

kloudkl commented 10 years ago

Multiple factors motivate this proposal including #57, #119, #129, the papers and codes of Generative Stochastic Networks (GSN) [1, 2, 3, 4, 5], and the scene labeling paper using Recurrent Convolutional Neural Network (RCNN) [6]. image

In the new design Blob becomes BlobNode and has source nodes and target nodes. The edges are represented by the LayerEdge which no longer containS bottom or top blobs. The nodes are independent of the processing layers. Both nodes and edges can be reused (data or weight parameter shared) according to the structures of the network which generally will not be linearly arranged layers from the bottom straightly to the top but truly networks like the social graphs.

[1] Yoshua Bengio, Éric Thibodeau-Laufer, Jason Yosinski. Deep Generative Stochastic Networks Trainable by Backprop. arXiv:1306.1091 [cs.LG]. 2013. [2] Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent. Generalized Denoising Auto-Encoders as Generative Models. NIPS, 2013. [3] Li Yao. Efficient implementation of Generative Stochastic Networks. https://github.com/yaoli/GSN. 2013. [4] @lightcatcher. Generative Stochastic Network (GSN) model. https://github.com/lisa-lab/pylearn2/pull/392. 2013. [5] Jian Zhou, Olga Troyanskaya. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction. JMLR W&CP 32 (1) : 745–753, 2014. [6] Pedro Pinheiro, Ronan Collobert. Recurrent Convolutional Neural Networks for Scene Labeling. JMLR W&CP 32 (1) : 82–90, 2014.

shelhamer commented 10 years ago

This is an excellent suggestion. Now that Caffe has learned to experiment with DAGs, weight sharing is a natural next generalization that we have been discussing in lab. Making a public issue like this will help focus the plan.

57 and #119 must be addressed first. Development in earnest of #57 will start after March 7. #119 effectively doubles the size of a model that can be trained on a single GPU, but requires careful changes to the solver, so it should perhaps wait until after #57. Then this and #119 can be pursued.

Thanks, especially for the clarity of this proposal and the references.

kloudkl commented 10 years ago

Thanks for your support!

March 7 has been mentioned several times. What is the blocking factor until then?

shelhamer commented 10 years ago

The ECCV '14 paper submission deadline. We are a research group, after all, and there's only so much :coffee:

shelhamer commented 10 years ago

We've decided on an alternative design for weight sharing. A new param field will be added to layers that define their parameter blobs, and Net::Init() will have a preprocessing step that instantiates shared parameter blobs and shares them among layers as needed. Layers that do not share parameters will hold their blobs internally, as they do now.

shelhamer commented 10 years ago

Closing as #500 will solve.