Closed shiorioxy closed 9 years ago
I guess caffe can do it, which makes me excited too. You can reference some issues about DAG caffe. just multi-output and multi-input in CNN.
I think that issues are used only for development right now and bug reports. Questions about usage, code and applications should be asked on caffe-users mailing list.
On the other hand, if you make an implementation of GoogleNet, then you might want to make a pull request to caffe, to include it as an additional example.
The Inception modules are simple compositions of fundamental layers like convolution, pooling, and concatenation. The model can be defined and executed in Caffe, since DAGs architectures are fully understood by the framework in both multiple inputs and multiple outputs / losses, but the training should be accelerated by multi-GPU parallelism to make it reasonable. Reproducing the model is a worthwhile goal since the weights are not public.
Below is an inception module in the form of a single net. In particular, this is my attempt at rendering inception (3a) as shown in Table 1 and Figure 3 of the paper. I've tested it in as far as I wrote some code that parses it and loads into a net, and then forward propagates a blob through it, and verified that its output blob has the correct dimensions ( 256 channels x 28 H x 28 W). The only trick was padding the convolutions so as to get the right dimensions of the resulting top blob. And some of the layer params are vestigial/extraneous (like the weight filler) from the lenet prototxt I copied from, not something specified by the paper.
One thing that might be wrong is I don't think the paper was explicit about whether there are relus inbetween the 3x3 max pool and the 1x1 pool proj. I didn't put a relu layer in there.
So if you wanted to implement the whole GoogLeNet, you have to make 9 copies of the layers in this hunk of prototxt, and adjust the layer and top and bottom names, and number of channels for the convolutions and max pools to match Table 1. Plus add all the non-inception layers. That seems... cumbersome... at best.
I expect repeated modules like inception are going to be a thing, and that what we need is a new layer type which is itself a network ("Network-in-network"?), so that a network could recursively include other networks. Such that you have a single layer standing in for the included network. It would need a param to identify the network that implements it, and specify the parameters that are distinctive in its instantiation (e.g. the # of output channels of the conv layers, which differ from inception module to inception module). Seems straightfoward in principle, but possibly tricky in making the implementation work for all scenarios.
[Edit: stuff deleted, now that I see they say they used DistBelief]
name: "Inception_3a"
input: "maxpool_2"
input_dim: 1
input_dim: 192
input_dim: 28
input_dim: 28
layers {
name: "1x1_conv"
type: CONVOLUTION
bottom: "maxpool_2"
top: "1x1_conv"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "1x1_relu"
type: RELU
bottom: "1x1_conv"
top: "1x1_relu"
}
layers {
name: "3x3reduce_conv"
type: CONVOLUTION
bottom: "maxpool_2"
top: "3x3reduce_conv"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "3x3reduce_relu"
type: RELU
bottom: "3x3reduce_conv"
top: "3x3reduce_relu"
}
layers {
name: "3x3_conv"
type: CONVOLUTION
bottom: "3x3reduce_relu"
top: "3x3_conv"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "3x3_relu"
type: RELU
bottom: "3x3_conv"
top: "3x3_relu"
}
layers {
name: "5x5reduce_conv"
type: CONVOLUTION
bottom: "maxpool_2"
top: "5x5reduce_conv"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "5x5reduce_relu"
type: RELU
bottom: "5x5reduce_conv"
top: "5x5reduce_relu"
}
layers {
name: "5x5_conv"
type: CONVOLUTION
bottom: "5x5reduce_relu"
top: "5x5_conv"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "5x5_relu"
type: RELU
bottom: "5x5_conv"
top: "5x5_relu"
}
layers {
name: "maxpool_3x3"
type: POOLING
bottom: "maxpool_2"
top: "maxpool_3x3"
pooling_param {
pool: MAX
pad: 1
kernel_size: 3
stride: 1
}
}
layers {
name: "poolproj_1x1conv"
type: CONVOLUTION
bottom: "maxpool_3x3"
top: "poolproj_1x1conv"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "poolproj_relu"
type: RELU
bottom: "poolproj_1x1conv"
top: "poolproj_relu"
}
layers {
name: "DepthConcatenation"
type: CONCAT
concat_param {
concat_dim: 1
}
bottom: "1x1_relu"
bottom: "3x3_relu"
bottom: "5x5_relu"
bottom: "poolproj_relu"
top: "inception3a_Output"
}
@shelhamer @jyegerlehner I am very excited with your comment, especially the last part layer "DepthConcatenation", which is a multiple inputs layer. This gave me great enlightenment, I'll try it and using this method to design my models, thank you very much.
FYI we have released the VGG team models in the Caffe format: https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014
@ksimonyan I have been read your paper, and very interested in the VGG. Thanks to your implementation, I was eager to go to follow your paper and some research, thank you
@ksimonyan Hello, Dr. Ksimonyan, you are so nice to release the VGG model. And you wrote "matlab/caffe/matcaffe_demo_vgg.m" in the Note, I want to know where I could download "matlab/caffe/matcaffe_demo_vgg.m". Could you give me the link? Thanks a lot.
@YanchengBai
At the moment, there are two example Matlab scripts for the VGG models in the dev
branch: matcaffe_demo_vgg is for BMVC-14 models, and matcaffe_demo_vgg_mean_pix is for ILSVRC-14 models (which rely on mean pixel, rather than mean image, subtraction).
Note that while our BMVC-14 models are currently supported by the dev
branch, the ILSVRC-14 ones require https://github.com/BVLC/caffe/pull/1070.
@ksimonyan Thanks a lot for you help. You are so nice.
@ouxinyu Thanks for sharing the models. I'm interested in the training protobufs especially in the weights initialization. I've tried training VGG on my own data and it seems like the loss is stuck (I'm using Gaussian filler for weights and biases). Can you provide the training protobufs?
Thanks
@ouxinyu I'm glad you found it useful. I am also excited! I think the Inception module, and the prospect of other networks within network, is rife with greater possibilities.
Since cudnn does not support pooling layer with padding, I suggest to remove the pad param of maxpool_3x3 and add "pad: 1" for poolproj_1x1conv instead.
Has anyone been able to replicate GoogleNet's results?
The closest published result - http://arxiv.org/abs/1411.4038. "68.5% top-1 and 88.4% top-5" @shelhamer is stated as one of the authors, so he may confirm it.
The GoogLeNet classifier replication in the linked paper is the work of @sguada -- thanks Sergio! The top-5 error for 1 crop and 1 model in the original GoogLeNet paper is 10.07% while the replication has a comparable top-5 error of 11.60%. The difference could be explained by differences in data augmentation -- the mentioned replication is only trained on mirrors and crops.
@shelhamer @sguada could you guys give us some hint on implementing googlenet, since its accuracy is so charming. I think there'll be a lot of people who wanna learn or do some experiment on it.
@sunbaigui it is already here #1367
@ducha-aiki I've already run on it, but never reach the claimed accuracy. Have you tried it, bro?
@sunbaigui I have got 64.4% top-1, which is less than 68.5. But I have only 3 Gb of GPU memory and use batch_size=32 and could not use the training and test net simultaneously (so, have no permanent testing, which allows to decrease learning rate in right moment). Also, I have experimented with data augmentation and it also could hurt the final result. And what are you results?
I will be releasing my prototxts and the way I trained soon. Stay tuned.
I trained it using a batch size of 32, and actually that seemed to help to converge faster than bigger batch sizes.
On Thursday, November 27, 2014, Dmytro Mishkin notifications@github.com wrote:
@sunbaigui https://github.com/sunbaigui I have got 64.4% top-1, which is less than 68.5. But I have only 3 Gb of GPU memory and use batch_size=32 and could not use the training and test net simultaneously (so, have no permanent testing, which allows to decrease learning rate in right moment). Also, I have experimented with data augmentation and it also could hurt the final result. And what is you results?
— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/1106#issuecomment-64768563.
Sergio
Take a look at #1598 for my replica of GoogleNet, including the prototxt, solver and model.
We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constrain by accumulating gradients over two training iterations. Pre-trained model and codes are available at http://vision.princeton.edu/pvt/GoogLeNet/
@yindaz thanks for sharing your implementation of GoogleNet, although the patch you shared is a bit messy. Check #1598 for a simpler initialization, a faster training and a slightly better model.
@yindaz I have a few questions about the COMPAT_DATA layer in your googlenet patch. Looking at the also provided convert_imageset_compat2.cpp file which doesn't resize the image to a fixed size, it seems to me that COMPAT_DATA can hold data of varying size. In that case, how to compute the mean_file as required?
@sguada may I ask, why did you use a 32 batch size, without raising the learning rate accordingly? Thanks
Actually, according to One weird trick for parallelizing convolutional neural networks when one decrease the batch size, should decrease the learning rate. The authors used batch size 32, so I did. Also I tried batch size 128 it was performing worse.
@sguada, I think @npit just missed fact that caffe divides loss by batch_size and meant that to have the same parameters update scale per image, one needs to increase or decrease learning rate in proportion of number images in batch. So this is done automatically and one can use same learning rate with any batch from this point of view.
Thank you @sguada @ducha-aiki ! So, as a rule of thumb, I'll just have just modify the number of iterations when using different batch sizes?
Is caffe can implement the GoogleNet, which is excited me. The 9 Inception modules seems very difficult....