BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.04k stars 18.7k forks source link

Max protobuffer message size limits network size #279

Closed sguada closed 10 years ago

sguada commented 10 years ago

When training big networks (over 500Mb proto buffers) protobuf cannot read them.

 A protocol message was rejected because it was too big (more than 536870912 bytes).  
To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

@Yangqing, @jeffdonahue maybe we should think about other way to organize our protobuf

https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream?csw=1#CodedInputStream.SetTotalBytesLimit.details

sguada commented 10 years ago

@Yangqing @jeffdonahue @shelhamer should we reconsider the way we use protobuf to store networks? https://developers.google.com/protocol-buffers/docs/techniques?csw=1#large-data

kloudkl commented 10 years ago

Won't such a huge network cause under-fitting?

sguada commented 10 years ago

Actually it doesn't, It gets 59% top1 val accuracy in 20 epochs (100k iterations), but I cannot resume and continue the training.

On Wednesday, April 2, 2014, kloudkl notifications@github.com wrote:

Won't such a huge network cause under-fitting?

Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/279#issuecomment-39407278 .

Sergio

kloudkl commented 10 years ago

According to Girshick et. al.[1], a large portion of the parameters of AlexNet bring little performance advantage. The detection accuracy of the output feature map of the last convolution layer is almost the same with the feature maps of the fully connected layers. Is it really necessary to increase the size of parameters? @jeffdonahue, what's your opinion?

[1] R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation [pdf]. arXiv preprint:1311.2524, November 2013.

jeffdonahue commented 10 years ago

I'm in favor of increasing (e.g. doubling) the size limit at least as a temporary hack to support larger networks, feel free to send a PR.

@kloudkl the concern with larger networks is more about over-fitting than under-fitting, and it's true that there is a good amount of evidence that the FC layers don't add much, but there is also evidence that larger networks nonetheless work better, e.g. Zeiler and Fergus [1] which doubles the size of the FC layers and more than doubles the number of convolutional filters in layers 3, 4, and 5. So I definitely don't think Caffe models should be limited to 500 MB forever.

[1] http://arxiv.org/pdf/1311.2901v3.pdf

shelhamer commented 10 years ago

Agreed on raising the size limit–it's mostly to avoid buffering and communication nightmares, which are irrelevant to us since we use protobuf for local storage.

Likewise I second not limiting the size of Caffe models–pure classification performance as cited by Jeff is a reason, but also DAG models will be larger.

kloudkl commented 10 years ago

Large network indeed would cause over-fitting rather than under-fitting.

There are a few practical methods to overcome over-fitting, balance the bias and variance trade-off, and improve the generalization of the network.

sguada commented 10 years ago

@kloudkl if you read Zeiler and Fergus [1] you would see that actually increasing the size of the convolutional layers don't lead to overfitting, but doubling the size of the fully connected layers does. In my current experiments, even with a network that big (more than 140 Million parameters), I haven't seen signs of overfitting yet, the validation accuracy keeps increasing while training.

But thanks for sharing the pointers.

Yangqing commented 10 years ago

Late to the party, but this problem did bug me when I was writing caffe too - I actually had to hard-code and override the default size limit:

https://github.com/BVLC/caffe/blob/master/src/caffe/util/io.cpp#L58

One possible way to handle that is to store the parameter values outside the network snapshot - the network will then only provide a pointer to something like a folder or a tar file that stores all the parameters. Ideally, we can put all these files into a zipped file that gets read when we load the model, although I am not crystal clear how C++ should handle such zip files - any good recommendations?

sguada commented 10 years ago

@Yangqing great!! For now I'm going to double the limit,

 coded_input->SetTotalBytesLimit(1073741824, 536870912);

Another solution will be to include one message with the overall architecture of the Network, and a set of messages, one per blob. That will reduce the size of each message greatly.

sguada commented 10 years ago

@Yangqing This worked :)

Now I'm able to resume the training I was doing with a pretty big Network.

sguada commented 10 years ago

Solved in #302