BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.14k stars 18.68k forks source link

Scale Invariant CNN (SICNN) #576

Closed kloudkl closed 10 years ago

kloudkl commented 10 years ago

The Spatial Pyramid Pooling net of #548 improves the speed of Regions with Convolutional Neural Network Features by extracting features for each image only once while R-CNN does so for each region of interest in an image. The most important insight of SPP-net is that only the classifiers or the fully-connected layers require fixed-length vector. The convolution layers do not have to constrain the sizes of the images. The experiments show that full image is better than cropped ones and larger scales lead to higher accuracy.

The SPP-net simulate the multiple scales with fixed-size networks. The "scale-mismatch" problem is not solved. In #308, multi-scale feature extraction is achieved by packing the multiple scales of a image in a single large image. They can only process pre-defined discrete scales.

The authentic scale invariant CNN means that the extracted features can be scaled up or down to get the features of the images undergoing the same scaling. The feature of an image only has to be extracted once by the network.

Any ideas about the existing works in this direction?

shelhamer commented 10 years ago

One can run a single net on a multi-scale pyramid by weight sharing or run on whatever scale is desired by on-the-fly net reshaping in #594. A single extraction for a deep feature invariant to all scaling is not possible due to filter discretization, nonlinearities, and so on (although one can down and upsample features as they please).

kloudkl commented 9 years ago

Angjoo Kanazawa, Abhishek Sharma, David Jacobs, Locally Scale-invariant Convolutional Neural Network, Deep Learning and Representation Learning Workshop: NIPS 2014.

akanazawa commented 9 years ago

Hi, the code for the locally scale-invariant ConvNet paper is available here, Thanks.

etienne87 commented 9 years ago

Must be great, but wouldn't that take much more time? i mean you need to transform every blob several time for max-pooling out right?

akanazawa commented 9 years ago

Yeah, it does take more memory and time. Now I recommend you checking out this recent arxiv paper http://arxiv.org/abs/1506.02025