BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.12k stars 18.69k forks source link

How to implement the GoogleNet? #1106

Closed shiorioxy closed 9 years ago

shiorioxy commented 10 years ago

Is caffe can implement the GoogleNet, which is excited me. The 9 Inception modules seems very difficult....

zgxiangyang commented 10 years ago

I guess caffe can do it, which makes me excited too. You can reference some issues about DAG caffe. just multi-output and multi-input in CNN.

cNikolaou commented 10 years ago

I think that issues are used only for development right now and bug reports. Questions about usage, code and applications should be asked on caffe-users mailing list.

On the other hand, if you make an implementation of GoogleNet, then you might want to make a pull request to caffe, to include it as an additional example.

shelhamer commented 10 years ago

The Inception modules are simple compositions of fundamental layers like convolution, pooling, and concatenation. The model can be defined and executed in Caffe, since DAGs architectures are fully understood by the framework in both multiple inputs and multiple outputs / losses, but the training should be accelerated by multi-GPU parallelism to make it reasonable. Reproducing the model is a worthwhile goal since the weights are not public.

jyegerlehner commented 10 years ago

Below is an inception module in the form of a single net. In particular, this is my attempt at rendering inception (3a) as shown in Table 1 and Figure 3 of the paper. I've tested it in as far as I wrote some code that parses it and loads into a net, and then forward propagates a blob through it, and verified that its output blob has the correct dimensions ( 256 channels x 28 H x 28 W). The only trick was padding the convolutions so as to get the right dimensions of the resulting top blob. And some of the layer params are vestigial/extraneous (like the weight filler) from the lenet prototxt I copied from, not something specified by the paper.

One thing that might be wrong is I don't think the paper was explicit about whether there are relus inbetween the 3x3 max pool and the 1x1 pool proj. I didn't put a relu layer in there.

So if you wanted to implement the whole GoogLeNet, you have to make 9 copies of the layers in this hunk of prototxt, and adjust the layer and top and bottom names, and number of channels for the convolutions and max pools to match Table 1. Plus add all the non-inception layers. That seems... cumbersome... at best.

I expect repeated modules like inception are going to be a thing, and that what we need is a new layer type which is itself a network ("Network-in-network"?), so that a network could recursively include other networks. Such that you have a single layer standing in for the included network. It would need a param to identify the network that implements it, and specify the parameters that are distinctive in its instantiation (e.g. the # of output channels of the conv layers, which differ from inception module to inception module). Seems straightfoward in principle, but possibly tricky in making the implementation work for all scenarios.

[Edit: stuff deleted, now that I see they say they used DistBelief]

name: "Inception_3a"

input: "maxpool_2"
input_dim: 1
input_dim: 192
input_dim: 28
input_dim: 28

layers {
  name: "1x1_conv"
  type: CONVOLUTION
  bottom: "maxpool_2"
  top: "1x1_conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 64
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layers {
  name: "1x1_relu"
  type: RELU
  bottom: "1x1_conv"
  top: "1x1_relu"
}

layers {
  name: "3x3reduce_conv"
  type: CONVOLUTION
  bottom: "maxpool_2"
  top: "3x3reduce_conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 96
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layers {
  name: "3x3reduce_relu"
  type: RELU
  bottom: "3x3reduce_conv"
  top: "3x3reduce_relu"
}

layers {
  name: "3x3_conv"
  type: CONVOLUTION
  bottom: "3x3reduce_relu"
  top: "3x3_conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layers {
  name: "3x3_relu"
  type: RELU
  bottom: "3x3_conv"
  top: "3x3_relu"
}

layers {
  name: "5x5reduce_conv"
  type: CONVOLUTION
  bottom: "maxpool_2"
  top: "5x5reduce_conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 16
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layers {
  name: "5x5reduce_relu"
  type: RELU
  bottom: "5x5reduce_conv"
  top: "5x5reduce_relu"
}

layers {
  name: "5x5_conv"
  type: CONVOLUTION
  bottom: "5x5reduce_relu"
  top: "5x5_conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layers {
  name: "5x5_relu"
  type: RELU
  bottom: "5x5_conv"
  top: "5x5_relu"
}

layers {
  name: "maxpool_3x3"
  type: POOLING
  bottom: "maxpool_2"
  top: "maxpool_3x3"
  pooling_param {
    pool: MAX
    pad: 1
    kernel_size: 3
    stride: 1
  }
}

layers {
  name: "poolproj_1x1conv"
  type: CONVOLUTION
  bottom: "maxpool_3x3"
  top: "poolproj_1x1conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layers {
  name: "poolproj_relu"
  type: RELU
  bottom: "poolproj_1x1conv"
  top: "poolproj_relu"
}

layers {
  name: "DepthConcatenation"
  type: CONCAT
  concat_param {
    concat_dim: 1
  }
  bottom: "1x1_relu"
  bottom: "3x3_relu"
  bottom: "5x5_relu"
  bottom: "poolproj_relu"
  top: "inception3a_Output"
}
ouxinyu commented 10 years ago

@shelhamer @jyegerlehner I am very excited with your comment, especially the last part layer "DepthConcatenation", which is a multiple inputs layer. This gave me great enlightenment, I'll try it and using this method to design my models, thank you very much.

ksimonyan commented 10 years ago

FYI we have released the VGG team models in the Caffe format: https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014

ouxinyu commented 10 years ago

@ksimonyan I have been read your paper, and very interested in the VGG. Thanks to your implementation, I was eager to go to follow your paper and some research, thank you

ghost commented 10 years ago

@ksimonyan Hello, Dr. Ksimonyan, you are so nice to release the VGG model. And you wrote "matlab/caffe/matcaffe_demo_vgg.m" in the Note, I want to know where I could download "matlab/caffe/matcaffe_demo_vgg.m". Could you give me the link? Thanks a lot.

ksimonyan commented 10 years ago

@YanchengBai At the moment, there are two example Matlab scripts for the VGG models in the dev branch: matcaffe_demo_vgg is for BMVC-14 models, and matcaffe_demo_vgg_mean_pix is for ILSVRC-14 models (which rely on mean pixel, rather than mean image, subtraction).

Note that while our BMVC-14 models are currently supported by the dev branch, the ILSVRC-14 ones require https://github.com/BVLC/caffe/pull/1070.

ghost commented 10 years ago

@ksimonyan Thanks a lot for you help. You are so nice.

amiralush commented 10 years ago

@ouxinyu Thanks for sharing the models. I'm interested in the training protobufs especially in the weights initialization. I've tried training VGG on my own data and it seems like the loss is stuck (I'm using Gaussian filler for weights and biases). Can you provide the training protobufs?

Thanks

jyegerlehner commented 10 years ago

@ouxinyu I'm glad you found it useful. I am also excited! I think the Inception module, and the prospect of other networks within network, is rife with greater possibilities.

happynear commented 10 years ago

Since cudnn does not support pooling layer with padding, I suggest to remove the pad param of maxpool_3x3 and add "pad: 1" for poolproj_1x1conv instead.

futurely commented 9 years ago

https://github.com/BVLC/caffe/issues/1169#issuecomment-63248888

mmoghimi commented 9 years ago

Has anyone been able to replicate GoogleNet's results?

ducha-aiki commented 9 years ago

The closest published result - http://arxiv.org/abs/1411.4038. "68.5% top-1 and 88.4% top-5" @shelhamer is stated as one of the authors, so he may confirm it.

shelhamer commented 9 years ago

The GoogLeNet classifier replication in the linked paper is the work of @sguada -- thanks Sergio! The top-5 error for 1 crop and 1 model in the original GoogLeNet paper is 10.07% while the replication has a comparable top-5 error of 11.60%. The difference could be explained by differences in data augmentation -- the mentioned replication is only trained on mirrors and crops.

sunbaigui commented 9 years ago

@shelhamer @sguada could you guys give us some hint on implementing googlenet, since its accuracy is so charming. I think there'll be a lot of people who wanna learn or do some experiment on it.

ducha-aiki commented 9 years ago

@sunbaigui it is already here #1367

sunbaigui commented 9 years ago

@ducha-aiki I've already run on it, but never reach the claimed accuracy. Have you tried it, bro?

ducha-aiki commented 9 years ago

@sunbaigui I have got 64.4% top-1, which is less than 68.5. But I have only 3 Gb of GPU memory and use batch_size=32 and could not use the training and test net simultaneously (so, have no permanent testing, which allows to decrease learning rate in right moment). Also, I have experimented with data augmentation and it also could hurt the final result. And what are you results?

sguada commented 9 years ago

I will be releasing my prototxts and the way I trained soon. Stay tuned.

I trained it using a batch size of 32, and actually that seemed to help to converge faster than bigger batch sizes.

On Thursday, November 27, 2014, Dmytro Mishkin notifications@github.com wrote:

@sunbaigui https://github.com/sunbaigui I have got 64.4% top-1, which is less than 68.5. But I have only 3 Gb of GPU memory and use batch_size=32 and could not use the training and test net simultaneously (so, have no permanent testing, which allows to decrease learning rate in right moment). Also, I have experimented with data augmentation and it also could hurt the final result. And what is you results?

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/1106#issuecomment-64768563.

Sergio

sguada commented 9 years ago

Take a look at #1598 for my replica of GoogleNet, including the prototxt, solver and model.

yindaz commented 9 years ago

We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constrain by accumulating gradients over two training iterations. Pre-trained model and codes are available at http://vision.princeton.edu/pvt/GoogLeNet/

sguada commented 9 years ago

@yindaz thanks for sharing your implementation of GoogleNet, although the patch you shared is a bit messy. Check #1598 for a simpler initialization, a faster training and a slightly better model.

ChenglongChen commented 9 years ago

@yindaz I have a few questions about the COMPAT_DATA layer in your googlenet patch. Looking at the also provided convert_imageset_compat2.cpp file which doesn't resize the image to a fixed size, it seems to me that COMPAT_DATA can hold data of varying size. In that case, how to compute the mean_file as required?

npit commented 9 years ago

@sguada may I ask, why did you use a 32 batch size, without raising the learning rate accordingly? Thanks

sguada commented 9 years ago

Actually, according to One weird trick for parallelizing convolutional neural networks when one decrease the batch size, should decrease the learning rate. The authors used batch size 32, so I did. Also I tried batch size 128 it was performing worse.

ducha-aiki commented 9 years ago

@sguada, I think @npit just missed fact that caffe divides loss by batch_size and meant that to have the same parameters update scale per image, one needs to increase or decrease learning rate in proportion of number images in batch. So this is done automatically and one can use same learning rate with any batch from this point of view.

npit commented 9 years ago

Thank you @sguada @ducha-aiki ! So, as a rule of thumb, I'll just have just modify the number of iterations when using different batch sizes?