BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.05k stars 18.7k forks source link

ImageNet LRN/MaxPool ordering #296

Closed kmatzen closed 10 years ago

kmatzen commented 10 years ago

I don't think it's explicitly stated anywhere that the ImageNet example is supposed to be an exact reimplementation of the Krizhevsky 2012 architecture, but if it is, then the order of the LRN and max pool layers in Caffe's implementation seems to be backwards.

This network uses conv -> max pool -> LRN. https://github.com/BVLC/caffe/blob/master/examples/imagenet/imagenet_train.prototxt#L48

This text suggests that he used conv -> LRN -> max pool. "Response-normalization layers follow the first and second convolutional layers. Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer."

Either ordering seems to get good results, but for people reimplementing papers that say Krizhevsky's architecture was used, then it might be worthwhile to make sure your implementation matches his paper.

jeffdonahue commented 10 years ago

Huh, looks like you're correct - interesting that nobody else has ever pointed this out after 8+ months of us using this reimplementation of the architecture (first in cuda-convnet, then decaf, now caffe).

Feel free to send a PR with a note in the documentation that our implementation differs from Krizhevsky's published architecture in this way. (And if someone from Berkeley cares to train an instance of the corrected version and finds it matches or outperforms the reference model, we could replace it. It does seem more natural to normalize then max-pool.)

Edit: actually we probably don't want to ever actually 'replace' the current reference model at this point as it's been used in many results that have already been disseminated in various forms, but we could (and probably will) have additional reference model(s).

shelhamer commented 10 years ago

I'm happy to re-train. What should we do with the result? The caffe_reference_imagenet_model is already in use, so it shouldn't be replaced outright.

@jeffdonahue's suggestion of caffe_reference_alexnet_model should work fine. Note that for further exactness we should train with "relighting" or state more obviously that we train without it.

Le dimanche 6 avril 2014, Jeff Donahue notifications@github.com a écrit :

Huh, looks like you're correct - interesting that nobody else has ever pointed this out after 8+ months of us using this reimplementation of the architecture (first in cuda-convnet, then decaf, now caffe).

Feel free to send a PR with a note in the documentation that our implementation differs from Krizhevsky's published architecture in this way. (And if someone from Berkeley cares to train an instance of the corrected version and finds it matches or outperforms the reference model, we could replace it. It does seem more natural to normalize then max-pool.)

— Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/296#issuecomment-39686496 .

Evan Shelhamer

jeffdonahue commented 10 years ago

Yeah, sounds good - maybe you could change 'imagenet' to 'alexnet' or some other descriptive name to make it clear it's a different architecture.

kloudkl commented 10 years ago

If the model is going to be re-trained, why don't we choose the ZeilerNet (#33) that outperformed the AlexNet last year?

shelhamer commented 10 years ago

@Yangqing our Caffe reference ImageNet model LRN -> Max instead of Max -> LRN as in the Krizhevsky architecture.

I'm training now. I'll check back with AlexNet model results later this week and we can decide exactly how to package it.

@kloudkl we plan to release a ZF model too, but https://github.com/BVLC/caffe/pull/33#issuecomment-33152193 still needs implementing to do it exactly right.

sguada commented 10 years ago

@shelhamer I will do a small test to check if the order is likely to affect the results. What I can tell you is that it increase the memory consumption by at least 1Gb. Also we train with data warped to fit 256x256 instead of resize-crop as stated in the paper.

Maybe we should differentiate more explicitly caffe_reference_model and alexnet_reference_model.

shelhamer commented 10 years ago

I'm already running the training.

Update: after three days the loss is ~1.8 and val accuracy is ~54% at 170,000 or so iterations.

shelhamer commented 10 years ago

The AlexNet / Krizhevsky '12 architecture model was released in #327. Follow-up there for the details of training (and note that there are small differences from the training regime described in the paper).

harvinderdabas commented 5 years ago

I am into hardware acceleration for CNN inference, and come across this when I compared google net with Alexnet, GoogleNet does the LRN after pooling, which is efficient from computation point of view. From the intent of the LRN layer I feel LRN should be done before max pool, because the LRN done first can have a impact on the max pooling decisions.